I tried to run with DelayedAutoRebalancer. When a participant host
(localhost_12913) was killed, shards hosted on it were not moved, which is
expected.
And the external view for the resource is like:
ExternalView for test:
{
"id" : "test",
"mapFields" : {
"test_0" : {
"localhost_12914" : "ONLINE"
},
"test_1" : {
"localhost_12914" : "ONLINE",
"localhost_12915" : "ONLINE"
},
"test_2" : {
"localhost_12915" : "ONLINE"
}
},
"listFields" : {
},
"simpleFields" : {
"BUCKET_SIZE" : "0",
"IDEAL_STATE_MODE" : "AUTO_REBALANCE",
"NUM_PARTITIONS" : "3",
"REBALANCER_CLASS_NAME" : "DelayedAutoRebalancer",
"REBALANCE_MODE" : "FULL_AUTO",
"REBALANCE_STRATEGY" : "AutoRebalanceStrategy",
"REPLICAS" : "2",
"STATE_MODEL_DEF_REF" : "OnlineOffline",
"STATE_MODEL_FACTORY_NAME" : "DEFAULT"
}
}
However, when I restarted the participant, the participant didn't get any
new transition requests and the external view became:
ExternalView for test:
{
"id" : "test",
"mapFields" : {
"test_0" : {
"localhost_12913" : "OFFLINE",
"localhost_12914" : "ONLINE"
},
"test_1" : {
"localhost_12914" : "ONLINE",
"localhost_12915" : "ONLINE"
},
"test_2" : {
"localhost_12913" : "OFFLINE",
"localhost_12915" : "ONLINE"
}
},
"listFields" : {
},
"simpleFields" : {
"BUCKET_SIZE" : "0",
"IDEAL_STATE_MODE" : "AUTO_REBALANCE",
"NUM_PARTITIONS" : "3",
"REBALANCER_CLASS_NAME" : "DelayedAutoRebalancer",
"REBALANCE_MODE" : "FULL_AUTO",
"REBALANCE_STRATEGY" : "AutoRebalanceStrategy",
"REPLICAS" : "2",
"STATE_MODEL_DEF_REF" : "OnlineOffline",
"STATE_MODEL_FACTORY_NAME" : "DEFAULT"
}
}
I am wondering if this is the expected behavior?
On Tue, Jan 23, 2018 at 2:38 PM, Bo Liu <[email protected]> wrote:
> Great, thank you for the prompt reply.
>
> Thanks,
> Bo
>
> On Tue, Jan 23, 2018 at 1:47 PM, kishore g <[email protected]> wrote:
>
>>
>> 1. Yes, you can set the max transitions constraint on per partition,
>> per instance, per resource scope. There is a helix admin API to set the
>> constraint. I dont have it handy.
>> 2. Yes, Helix will send OFFLINE->SLAVE transitions to all partitions
>> that were on the host and still present in the idealstate. If its removed
>> from Idealstate, it will send OFFLINE->DROPPED transition.
>> 3. Right. Expiry is same as a restart. The only difference is
>> with expiry, it calls reset method on the statemodel where one can plugin
>> custom behavior.
>>
>>
>>
>> On Tue, Jan 23, 2018 at 11:57 AM, Bo Liu <[email protected]> wrote:
>>
>>> Thanks Kishore & Lei!
>>>
>>> It's a good point to rely on the data in a local partition to decide if
>>> a bootstrap is needed or catching up is good enough.'
>>>
>>> A few more questions.
>>>
>>> 1. is there a way to allow at most one transition for a partition at a
>>> time? During a state transition, a participant needs to setup proper
>>> replication upstream for itself (in the case where it is transiting to
>>> Slave) or other replicas (in the case it is transiting to Master). So the
>>> participant needs to learn the ip:port for other replicas in the cluster.
>>> No concurrent transitions allowed for a partition will make it much easier.
>>>
>>> 2. When a participant restarts, I assume it will connect to ZK with a
>>> new session id. With DelayedAutoRebalancer, helix will not move
>>> replicas away from the participants, but it will promote some Slave
>>> replicas on other hosts to be the new Masters. Once the restarted host is
>>> back, will helix send "OFFLINE -> SLAVE" transition requests to it for all
>>> the partitions that were on this participant before the restart?
>>>
>>> 3. When the ZK session is expired on a participant (no restart), helix
>>> will behave the same, i.e., sending "OFFLINE->SLAVE" for all partitions
>>> to the participant once it reconnect to ZK, right?
>>>
>>> Thanks,
>>> Bo
>>>
>>> On Tue, Jan 23, 2018 at 10:39 AM, kishore g <[email protected]> wrote:
>>>
>>>> Relying on reuse of the same statemodel instance by Helix might make it
>>>> model too rigid and tied to current implementation in Helix. Let's not
>>>> expose that to the clients.
>>>>
>>>> Helix internally carries over the previous partitions assignment during
>>>> startup but sets the state to initial state (OFFLINE in this case) by
>>>> default. If the client really needs to know what was the previous state, we
>>>> can provide a hook to the client to compute the initial state. In any case,
>>>> lets hear more from Bo before making any changes.
>>>>
>>>> On Tue, Jan 23, 2018 at 9:19 AM, Lei Xia <[email protected]> wrote:
>>>>
>>>>> Hi, Bo
>>>>>
>>>>>
>>>>> As Kishore commented, your offline->slave state transition callback
>>>>> needs some logic to determine whether a bootstrap or catch up is needed to
>>>>> transit a replica to slave. A common way is to persist the data version
>>>>> of
>>>>> a local partition somewhere, and during offline->slave, comparing local
>>>>> version (if there is) with current Master's version to determine if
>>>>> bootstrap (if version is null or too old) or catch-up is needed.
>>>>>
>>>>>
>>>>> There is one more difference in how Helix handles participant
>>>>> restarting vs ZK session. When a participant starts (or restarts), it
>>>>> creates a new StateModel (by calling CreateStateModel() in your
>>>>> StateModelFactory) for each partition. However, if a participant loses ZK
>>>>> session and comes back (with new session), it will reuse the StateModel
>>>>> for
>>>>> partitions that were there before instead of creating a new one. You may
>>>>> leverage this to tell whether a participant has been restarted or just
>>>>> re-established the ZK connection.
>>>>>
>>>>>
>>>>> In addition, the Delayed feature in DelayedAutoRebalancer is a
>>>>> little different then what you may understand. When you lose a
>>>>> participant
>>>>> (e.g, crashed, in maintenance), you lose one replica for some partitions.
>>>>> In this situation, Helix will usually bring up a new replica in some other
>>>>> live node immediately to maintain the required replica count.
>>>>> However, this may bring performance impact since bringing a new replica
>>>>> can
>>>>> require data bootstrap in new node. If you expect the original
>>>>> participant
>>>>> will be back online soon and also you can tolerate losing one or more
>>>>> replica in short-term, then you can set a delay time here. In which Helix
>>>>> will not bring a new replica before this time. Hope that makes it more
>>>>> clear.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Thanks
>>>>>
>>>>> Lei
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *Lei Xia*
>>>>>
>>>>>
>>>>> Data Infra/Helix
>>>>>
>>>>> [email protected]
>>>>> www.linkedin.com/in/lxia1
>>>>> ------------------------------
>>>>> *From:* Bo Liu <[email protected]>
>>>>> *Sent:* Monday, January 22, 2018 11:12:48 PM
>>>>> *To:* [email protected]
>>>>> *Subject:* differentiate between bootstrap and a soft failure
>>>>>
>>>>> Hi There,
>>>>>
>>>>> I am using FULL_AUTO with MasterSlave and DelayedAutoRebalancer. How
>>>>> can a participant differentiate between these two cases:
>>>>>
>>>>> 1) when a participant first joins a cluster, it will be requested to
>>>>> transit from OFFLINE to SLAVE. Since the participant doesn't have any data
>>>>> for this partition, it needs to bootstrap and download data from another
>>>>> participant or a data source.
>>>>> 2) when a participant loses its ZK session, the controller will
>>>>> automatically change the participant to be OFFLINE in ZK. If the
>>>>> participant manages to establish a new session to ZK before the delayed
>>>>> time threshold, the controller will send a request to it to switch from
>>>>> OFFLINE to SLAVE. In this case, the participant already has the data for
>>>>> the partition, so it doesn't need to bootstrap from other data sources.
>>>>>
>>>>> --
>>>>> Best regards,
>>>>> Bo
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Best regards,
>>> Bo
>>>
>>>
>>
>
>
> --
> Best regards,
> Bo
>
>
--
Best regards,
Bo