Hi, Bo
That is not the expected behavior. Would you add (or replace) the
following configs into your idealstate? The ""MIN_ACTIVE_REPLICAS" tells
Helix the minimal replica it should maintain, for example, if your total
replica count is 3 and you lose 2 instances, Helix will bring at least 1
more replica online immediately irregarding of delayed setting to meet the
minimal replica requirement.
,"REBALANCE_STRATEGY":"org.apache.helix.controller.rebalancer.strategy.CrushRebalanceStrategy"
, "MIN_ACTIVE_REPLICAS":"2"
,"REBALANCER_CLASS_NAME":"org.apache.helix.controller.rebalancer.DelayedAutoRebalancer"
Also please add the following two configs into your ClusterConfig,
specially DELAY_REBALANCE_TIME specifies how long Helix should delay to
bring new replica, e.g, if an instance is down and does not come back after
600000ms, Helix will move all replica on that instance to other live
instances.
"DELAY_REBALANCE_ENABLED" : "true",
"DELAY_REBALANCE_TIME" : "600000",
Please have a try and let us know how it works. And apologize to not have
an updated manual on our website, we are working on updating all of our
developer manuals for all latest new features, it will be out soon.
Thanks
Lei
On Thu, Jan 25, 2018 at 6:17 PM, Bo Liu <[email protected]> wrote:
> I tried to run with DelayedAutoRebalancer. When a participant host
> (localhost_12913) was killed, shards hosted on it were not moved, which is
> expected.
> And the external view for the resource is like:
>
> ExternalView for test:
>
> {
>
> "id" : "test",
>
> "mapFields" : {
>
> "test_0" : {
>
> "localhost_12914" : "ONLINE"
>
> },
>
> "test_1" : {
>
> "localhost_12914" : "ONLINE",
>
> "localhost_12915" : "ONLINE"
>
> },
>
> "test_2" : {
>
> "localhost_12915" : "ONLINE"
>
> }
>
> },
>
> "listFields" : {
>
> },
>
> "simpleFields" : {
>
> "BUCKET_SIZE" : "0",
>
> "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>
> "NUM_PARTITIONS" : "3",
>
> "REBALANCER_CLASS_NAME" : "DelayedAutoRebalancer",
>
> "REBALANCE_MODE" : "FULL_AUTO",
>
> "REBALANCE_STRATEGY" : "AutoRebalanceStrategy",
>
> "REPLICAS" : "2",
>
> "STATE_MODEL_DEF_REF" : "OnlineOffline",
>
> "STATE_MODEL_FACTORY_NAME" : "DEFAULT"
>
> }
>
> }
>
> However, when I restarted the participant, the participant didn't get any
> new transition requests and the external view became:
>
> ExternalView for test:
>
> {
>
> "id" : "test",
>
> "mapFields" : {
>
> "test_0" : {
>
> "localhost_12913" : "OFFLINE",
>
> "localhost_12914" : "ONLINE"
>
> },
>
> "test_1" : {
>
> "localhost_12914" : "ONLINE",
>
> "localhost_12915" : "ONLINE"
>
> },
>
> "test_2" : {
>
> "localhost_12913" : "OFFLINE",
>
> "localhost_12915" : "ONLINE"
>
> }
>
> },
>
> "listFields" : {
>
> },
>
> "simpleFields" : {
>
> "BUCKET_SIZE" : "0",
>
> "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>
> "NUM_PARTITIONS" : "3",
>
> "REBALANCER_CLASS_NAME" : "DelayedAutoRebalancer",
>
> "REBALANCE_MODE" : "FULL_AUTO",
>
> "REBALANCE_STRATEGY" : "AutoRebalanceStrategy",
>
> "REPLICAS" : "2",
>
> "STATE_MODEL_DEF_REF" : "OnlineOffline",
>
> "STATE_MODEL_FACTORY_NAME" : "DEFAULT"
>
> }
>
> }
>
> I am wondering if this is the expected behavior?
>
>
>
> On Tue, Jan 23, 2018 at 2:38 PM, Bo Liu <[email protected]> wrote:
>
>> Great, thank you for the prompt reply.
>>
>> Thanks,
>> Bo
>>
>> On Tue, Jan 23, 2018 at 1:47 PM, kishore g <[email protected]> wrote:
>>
>>>
>>> 1. Yes, you can set the max transitions constraint on per partition,
>>> per instance, per resource scope. There is a helix admin API to set the
>>> constraint. I dont have it handy.
>>> 2. Yes, Helix will send OFFLINE->SLAVE transitions to all
>>> partitions that were on the host and still present in the idealstate. If
>>> its removed from Idealstate, it will send OFFLINE->DROPPED transition.
>>> 3. Right. Expiry is same as a restart. The only difference is
>>> with expiry, it calls reset method on the statemodel where one can plugin
>>> custom behavior.
>>>
>>>
>>>
>>> On Tue, Jan 23, 2018 at 11:57 AM, Bo Liu <[email protected]> wrote:
>>>
>>>> Thanks Kishore & Lei!
>>>>
>>>> It's a good point to rely on the data in a local partition to decide if
>>>> a bootstrap is needed or catching up is good enough.'
>>>>
>>>> A few more questions.
>>>>
>>>> 1. is there a way to allow at most one transition for a partition at a
>>>> time? During a state transition, a participant needs to setup proper
>>>> replication upstream for itself (in the case where it is transiting to
>>>> Slave) or other replicas (in the case it is transiting to Master). So the
>>>> participant needs to learn the ip:port for other replicas in the cluster.
>>>> No concurrent transitions allowed for a partition will make it much easier.
>>>>
>>>> 2. When a participant restarts, I assume it will connect to ZK with a
>>>> new session id. With DelayedAutoRebalancer, helix will not move
>>>> replicas away from the participants, but it will promote some Slave
>>>> replicas on other hosts to be the new Masters. Once the restarted host is
>>>> back, will helix send "OFFLINE -> SLAVE" transition requests to it for all
>>>> the partitions that were on this participant before the restart?
>>>>
>>>> 3. When the ZK session is expired on a participant (no restart), helix
>>>> will behave the same, i.e., sending "OFFLINE->SLAVE" for all partitions
>>>> to the participant once it reconnect to ZK, right?
>>>>
>>>> Thanks,
>>>> Bo
>>>>
>>>> On Tue, Jan 23, 2018 at 10:39 AM, kishore g <[email protected]>
>>>> wrote:
>>>>
>>>>> Relying on reuse of the same statemodel instance by Helix might make
>>>>> it model too rigid and tied to current implementation in Helix. Let's not
>>>>> expose that to the clients.
>>>>>
>>>>> Helix internally carries over the previous partitions assignment
>>>>> during startup but sets the state to initial state (OFFLINE in this case)
>>>>> by default. If the client really needs to know what was the previous
>>>>> state,
>>>>> we can provide a hook to the client to compute the initial state. In any
>>>>> case, lets hear more from Bo before making any changes.
>>>>>
>>>>> On Tue, Jan 23, 2018 at 9:19 AM, Lei Xia <[email protected]> wrote:
>>>>>
>>>>>> Hi, Bo
>>>>>>
>>>>>>
>>>>>> As Kishore commented, your offline->slave state transition callback
>>>>>> needs some logic to determine whether a bootstrap or catch up is needed
>>>>>> to
>>>>>> transit a replica to slave. A common way is to persist the data version
>>>>>> of
>>>>>> a local partition somewhere, and during offline->slave, comparing local
>>>>>> version (if there is) with current Master's version to determine if
>>>>>> bootstrap (if version is null or too old) or catch-up is needed.
>>>>>>
>>>>>>
>>>>>> There is one more difference in how Helix handles participant
>>>>>> restarting vs ZK session. When a participant starts (or restarts), it
>>>>>> creates a new StateModel (by calling CreateStateModel() in your
>>>>>> StateModelFactory) for each partition. However, if a participant loses
>>>>>> ZK
>>>>>> session and comes back (with new session), it will reuse the StateModel
>>>>>> for
>>>>>> partitions that were there before instead of creating a new one. You may
>>>>>> leverage this to tell whether a participant has been restarted or just
>>>>>> re-established the ZK connection.
>>>>>>
>>>>>>
>>>>>> In addition, the Delayed feature in DelayedAutoRebalancer is a
>>>>>> little different then what you may understand. When you lose a
>>>>>> participant
>>>>>> (e.g, crashed, in maintenance), you lose one replica for some
>>>>>> partitions.
>>>>>> In this situation, Helix will usually bring up a new replica in some
>>>>>> other
>>>>>> live node immediately to maintain the required replica count.
>>>>>> However, this may bring performance impact since bringing a new replica
>>>>>> can
>>>>>> require data bootstrap in new node. If you expect the original
>>>>>> participant
>>>>>> will be back online soon and also you can tolerate losing one or more
>>>>>> replica in short-term, then you can set a delay time here. In which Helix
>>>>>> will not bring a new replica before this time. Hope that makes it more
>>>>>> clear.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Lei
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> *Lei Xia*
>>>>>>
>>>>>>
>>>>>> Data Infra/Helix
>>>>>>
>>>>>> [email protected]
>>>>>> www.linkedin.com/in/lxia1
>>>>>> ------------------------------
>>>>>> *From:* Bo Liu <[email protected]>
>>>>>> *Sent:* Monday, January 22, 2018 11:12:48 PM
>>>>>> *To:* [email protected]
>>>>>> *Subject:* differentiate between bootstrap and a soft failure
>>>>>>
>>>>>> Hi There,
>>>>>>
>>>>>> I am using FULL_AUTO with MasterSlave and DelayedAutoRebalancer. How
>>>>>> can a participant differentiate between these two cases:
>>>>>>
>>>>>> 1) when a participant first joins a cluster, it will be requested to
>>>>>> transit from OFFLINE to SLAVE. Since the participant doesn't have any
>>>>>> data
>>>>>> for this partition, it needs to bootstrap and download data from another
>>>>>> participant or a data source.
>>>>>> 2) when a participant loses its ZK session, the controller will
>>>>>> automatically change the participant to be OFFLINE in ZK. If the
>>>>>> participant manages to establish a new session to ZK before the delayed
>>>>>> time threshold, the controller will send a request to it to switch from
>>>>>> OFFLINE to SLAVE. In this case, the participant already has the data for
>>>>>> the partition, so it doesn't need to bootstrap from other data sources.
>>>>>>
>>>>>> --
>>>>>> Best regards,
>>>>>> Bo
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Best regards,
>>>> Bo
>>>>
>>>>
>>>
>>
>>
>> --
>> Best regards,
>> Bo
>>
>>
>
>
> --
> Best regards,
> Bo
>
>