Re: differentiate between bootstrap and a soft failure

Bo Liu Thu, 25 Jan 2018 18:18:07 -0800

I tried to run with DelayedAutoRebalancer. When a participant host
(localhost_12913) was killed, shards hosted on it were not moved, which is
expected.
And the external view for the resource is like:


ExternalView for test:

{

  "id" : "test",

  "mapFields" : {

    "test_0" : {

      "localhost_12914" : "ONLINE"

    },

    "test_1" : {

      "localhost_12914" : "ONLINE",

      "localhost_12915" : "ONLINE"

    },

    "test_2" : {

      "localhost_12915" : "ONLINE"

    }

  },

  "listFields" : {

  },

  "simpleFields" : {

    "BUCKET_SIZE" : "0",

    "IDEAL_STATE_MODE" : "AUTO_REBALANCE",

    "NUM_PARTITIONS" : "3",

    "REBALANCER_CLASS_NAME" : "DelayedAutoRebalancer",

    "REBALANCE_MODE" : "FULL_AUTO",

    "REBALANCE_STRATEGY" : "AutoRebalanceStrategy",

    "REPLICAS" : "2",

    "STATE_MODEL_DEF_REF" : "OnlineOffline",

    "STATE_MODEL_FACTORY_NAME" : "DEFAULT"

  }

}

However, when I restarted the participant, the participant didn't get any
new transition requests and the external view became:

ExternalView for test:

{

  "id" : "test",

  "mapFields" : {

    "test_0" : {

      "localhost_12913" : "OFFLINE",

      "localhost_12914" : "ONLINE"

    },

    "test_1" : {

      "localhost_12914" : "ONLINE",

      "localhost_12915" : "ONLINE"

    },

    "test_2" : {

      "localhost_12913" : "OFFLINE",

      "localhost_12915" : "ONLINE"

    }

  },

  "listFields" : {

  },

  "simpleFields" : {

    "BUCKET_SIZE" : "0",

    "IDEAL_STATE_MODE" : "AUTO_REBALANCE",

    "NUM_PARTITIONS" : "3",

    "REBALANCER_CLASS_NAME" : "DelayedAutoRebalancer",

    "REBALANCE_MODE" : "FULL_AUTO",

    "REBALANCE_STRATEGY" : "AutoRebalanceStrategy",

    "REPLICAS" : "2",

    "STATE_MODEL_DEF_REF" : "OnlineOffline",

    "STATE_MODEL_FACTORY_NAME" : "DEFAULT"

  }

}

I am wondering if this is the expected behavior?



On Tue, Jan 23, 2018 at 2:38 PM, Bo Liu <[email protected]> wrote:

> Great, thank you for the prompt reply.
>
> Thanks,
> Bo
>
> On Tue, Jan 23, 2018 at 1:47 PM, kishore g <[email protected]> wrote:
>
>>
>>    1. Yes, you can set the max transitions constraint on per partition,
>>    per instance, per resource scope. There is a helix admin API to set the
>>    constraint. I dont have it handy.
>>    2.  Yes, Helix will send OFFLINE->SLAVE transitions to all partitions
>>    that were on the host and still present in the idealstate. If its removed
>>    from Idealstate, it will send OFFLINE->DROPPED transition.
>>    3. Right. Expiry is same as a restart. The only difference is
>>    with expiry, it calls reset method on the statemodel where one can plugin
>>    custom behavior.
>>
>>
>>
>> On Tue, Jan 23, 2018 at 11:57 AM, Bo Liu <[email protected]> wrote:
>>
>>> Thanks Kishore & Lei!
>>>
>>> It's a good point to rely on the data in a local partition to decide if
>>> a bootstrap is needed or catching up is good enough.'
>>>
>>> A few more questions.
>>>
>>> 1. is there a way to allow at most one transition for a partition at a
>>> time? During a state transition, a participant needs to setup proper
>>> replication upstream for itself (in the case where it is transiting to
>>> Slave) or other replicas (in the case it is transiting to Master). So the
>>> participant needs to learn the ip:port for other replicas in the cluster.
>>> No concurrent transitions allowed for a partition will make it much easier.
>>>
>>> 2. When a participant restarts, I assume it will connect to ZK with a
>>> new session id. With DelayedAutoRebalancer, helix will not move
>>> replicas away from the participants, but it will promote some Slave
>>> replicas on other hosts to be the new Masters. Once the restarted host is
>>> back, will helix send "OFFLINE -> SLAVE" transition requests to it for all
>>> the partitions that were on this participant before the restart?
>>>
>>> 3. When the ZK session is expired on a participant (no restart), helix
>>> will behave the same, i.e., sending "OFFLINE->SLAVE" for all partitions
>>> to the participant once it reconnect to ZK, right?
>>>
>>> Thanks,
>>> Bo
>>>
>>> On Tue, Jan 23, 2018 at 10:39 AM, kishore g <[email protected]> wrote:
>>>
>>>> Relying on reuse of the same statemodel instance by Helix might make it
>>>> model too rigid and tied to current implementation in Helix. Let's not
>>>> expose that to the clients.
>>>>
>>>> Helix internally carries over the previous partitions assignment during
>>>> startup but sets the state to initial state (OFFLINE in this case) by
>>>> default. If the client really needs to know what was the previous state, we
>>>> can provide a hook to the client to compute the initial state. In any case,
>>>> lets hear more from Bo before making any changes.
>>>>
>>>> On Tue, Jan 23, 2018 at 9:19 AM, Lei Xia <[email protected]> wrote:
>>>>
>>>>> Hi, Bo
>>>>>
>>>>>
>>>>>   As Kishore commented, your offline->slave state transition callback
>>>>> needs some logic to determine whether a bootstrap or catch up is needed to
>>>>> transit a replica to slave.  A common way is to persist the data version 
>>>>> of
>>>>> a local partition somewhere,  and during offline->slave, comparing local
>>>>> version (if there is) with current Master's version to determine if
>>>>> bootstrap (if version is null or too old) or catch-up is needed.
>>>>>
>>>>>
>>>>>   There is one more difference in how Helix handles participant
>>>>> restarting vs ZK session. When a participant starts (or restarts), it
>>>>> creates a new StateModel (by calling CreateStateModel() in your
>>>>> StateModelFactory) for each partition.  However, if a participant loses ZK
>>>>> session and comes back (with new session), it will reuse the StateModel 
>>>>> for
>>>>> partitions that were there before instead of creating a new one.  You may
>>>>> leverage this to tell whether a participant has been restarted or just
>>>>> re-established the ZK connection.
>>>>>
>>>>>
>>>>>   In addition, the Delayed feature in DelayedAutoRebalancer is a
>>>>> little different then what you may understand.  When you lose a 
>>>>> participant
>>>>> (e.g, crashed, in maintenance),  you lose one replica for some partitions.
>>>>> In this situation, Helix will usually bring up a new replica in some other
>>>>> live node immediately to maintain the required replica count.
>>>>> However, this may bring performance impact since bringing a new replica 
>>>>> can
>>>>> require data bootstrap in new node.  If you expect the original 
>>>>> participant
>>>>> will be back online soon and also you can tolerate losing one or more
>>>>> replica in short-term, then you can set a delay time here. In which Helix
>>>>> will not bring a new replica before this time.  Hope that makes it more
>>>>> clear.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Thanks
>>>>>
>>>>> Lei
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *Lei Xia*
>>>>>
>>>>>
>>>>> Data Infra/Helix
>>>>>
>>>>> [email protected]
>>>>> www.linkedin.com/in/lxia1
>>>>> ------------------------------
>>>>> *From:* Bo Liu <[email protected]>
>>>>> *Sent:* Monday, January 22, 2018 11:12:48 PM
>>>>> *To:* [email protected]
>>>>> *Subject:* differentiate between bootstrap and a soft failure
>>>>>
>>>>> Hi There,
>>>>>
>>>>> I am using FULL_AUTO with MasterSlave and DelayedAutoRebalancer. How
>>>>> can a participant differentiate between these two cases:
>>>>>
>>>>> 1) when a participant first joins a cluster, it will be requested to
>>>>> transit from OFFLINE to SLAVE. Since the participant doesn't have any data
>>>>> for this partition, it needs to bootstrap and download data from another
>>>>> participant or a data source.
>>>>> 2) when a participant loses its ZK session, the controller will
>>>>> automatically change the participant to be OFFLINE in ZK. If the
>>>>> participant manages to establish a new session to ZK before the delayed
>>>>> time threshold, the controller will send a request to it to switch from
>>>>> OFFLINE to SLAVE. In this case, the participant already has the data for
>>>>> the partition, so it doesn't need to bootstrap from other data sources.
>>>>>
>>>>> --
>>>>> Best regards,
>>>>> Bo
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Best regards,
>>> Bo
>>>
>>>
>>
>
>
> --
> Best regards,
> Bo
>
>


-- 
Best regards,
Bo

Re: differentiate between bootstrap and a soft failure

Reply via email to