Re: differentiate between bootstrap and a soft failure

Bo Liu Fri, 26 Jan 2018 11:51:11 -0800

Thanks Lei, will try it out.
Yes, a tutorial page for this new feature would be very helpful.


On Jan 26, 2018 09:49, "Lei Xia" <l...@apache.org> wrote:

> Hi, Bo
>
>    That is not the expected behavior.   Would you add (or replace) the
> following configs into your idealstate?   The ""MIN_ACTIVE_REPLICAS" tells
> Helix the minimal replica it should maintain, for example, if your total
> replica count is 3 and you lose 2 instances, Helix will bring at least 1
> more replica online immediately irregarding of delayed setting to meet the
> minimal replica requirement.
>
>     ,"REBALANCE_STRATEGY":"org.apache.helix.controller.
> rebalancer.strategy.CrushRebalanceStrategy"
>     , "MIN_ACTIVE_REPLICAS":"2"
>     ,"REBALANCER_CLASS_NAME":"org.apache.helix.controller.rebalancer.
> DelayedAutoRebalancer"
>
> Also please add the following two configs into your ClusterConfig,
> specially DELAY_REBALANCE_TIME specifies how long Helix should delay to
> bring new replica, e.g, if an instance is down and does not come back after
> 600000ms, Helix will move all replica on that instance to other live
> instances.
>
> "DELAY_REBALANCE_ENABLED" : "true",
>     "DELAY_REBALANCE_TIME" : "600000",
>
>
> Please have a try and let us know how it works.  And apologize to not have
> an updated manual on our website, we are working on updating all of our
> developer manuals for all latest new features, it will be out soon.
>
>
> Thanks
> Lei
>
> On Thu, Jan 25, 2018 at 6:17 PM, Bo Liu <newpoo....@gmail.com> wrote:
>
>> I tried to run with DelayedAutoRebalancer. When a participant host
>> (localhost_12913) was killed, shards hosted on it were not moved, which is
>> expected.
>> And the external view for the resource is like:
>>
>> ExternalView for test:
>>
>> {
>>
>>   "id" : "test",
>>
>>   "mapFields" : {
>>
>>     "test_0" : {
>>
>>       "localhost_12914" : "ONLINE"
>>
>>     },
>>
>>     "test_1" : {
>>
>>       "localhost_12914" : "ONLINE",
>>
>>       "localhost_12915" : "ONLINE"
>>
>>     },
>>
>>     "test_2" : {
>>
>>       "localhost_12915" : "ONLINE"
>>
>>     }
>>
>>   },
>>
>>   "listFields" : {
>>
>>   },
>>
>>   "simpleFields" : {
>>
>>     "BUCKET_SIZE" : "0",
>>
>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>
>>     "NUM_PARTITIONS" : "3",
>>
>>     "REBALANCER_CLASS_NAME" : "DelayedAutoRebalancer",
>>
>>     "REBALANCE_MODE" : "FULL_AUTO",
>>
>>     "REBALANCE_STRATEGY" : "AutoRebalanceStrategy",
>>
>>     "REPLICAS" : "2",
>>
>>     "STATE_MODEL_DEF_REF" : "OnlineOffline",
>>
>>     "STATE_MODEL_FACTORY_NAME" : "DEFAULT"
>>
>>   }
>>
>> }
>>
>> However, when I restarted the participant, the participant didn't get any
>> new transition requests and the external view became:
>>
>> ExternalView for test:
>>
>> {
>>
>>   "id" : "test",
>>
>>   "mapFields" : {
>>
>>     "test_0" : {
>>
>>       "localhost_12913" : "OFFLINE",
>>
>>       "localhost_12914" : "ONLINE"
>>
>>     },
>>
>>     "test_1" : {
>>
>>       "localhost_12914" : "ONLINE",
>>
>>       "localhost_12915" : "ONLINE"
>>
>>     },
>>
>>     "test_2" : {
>>
>>       "localhost_12913" : "OFFLINE",
>>
>>       "localhost_12915" : "ONLINE"
>>
>>     }
>>
>>   },
>>
>>   "listFields" : {
>>
>>   },
>>
>>   "simpleFields" : {
>>
>>     "BUCKET_SIZE" : "0",
>>
>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>
>>     "NUM_PARTITIONS" : "3",
>>
>>     "REBALANCER_CLASS_NAME" : "DelayedAutoRebalancer",
>>
>>     "REBALANCE_MODE" : "FULL_AUTO",
>>
>>     "REBALANCE_STRATEGY" : "AutoRebalanceStrategy",
>>
>>     "REPLICAS" : "2",
>>
>>     "STATE_MODEL_DEF_REF" : "OnlineOffline",
>>
>>     "STATE_MODEL_FACTORY_NAME" : "DEFAULT"
>>
>>   }
>>
>> }
>>
>> I am wondering if this is the expected behavior?
>>
>>
>>
>> On Tue, Jan 23, 2018 at 2:38 PM, Bo Liu <newpoo....@gmail.com> wrote:
>>
>>> Great, thank you for the prompt reply.
>>>
>>> Thanks,
>>> Bo
>>>
>>> On Tue, Jan 23, 2018 at 1:47 PM, kishore g <g.kish...@gmail.com> wrote:
>>>
>>>>
>>>>    1. Yes, you can set the max transitions constraint on per
>>>>    partition, per instance, per resource scope. There is a helix admin API 
>>>> to
>>>>    set the constraint. I dont have it handy.
>>>>    2.  Yes, Helix will send OFFLINE->SLAVE transitions to all
>>>>    partitions that were on the host and still present in the idealstate. If
>>>>    its removed from Idealstate, it will send OFFLINE->DROPPED transition.
>>>>    3. Right. Expiry is same as a restart. The only difference is
>>>>    with expiry, it calls reset method on the statemodel where one can 
>>>> plugin
>>>>    custom behavior.
>>>>
>>>>
>>>>
>>>> On Tue, Jan 23, 2018 at 11:57 AM, Bo Liu <newpoo....@gmail.com> wrote:
>>>>
>>>>> Thanks Kishore & Lei!
>>>>>
>>>>> It's a good point to rely on the data in a local partition to decide
>>>>> if a bootstrap is needed or catching up is good enough.'
>>>>>
>>>>> A few more questions.
>>>>>
>>>>> 1. is there a way to allow at most one transition for a partition at a
>>>>> time? During a state transition, a participant needs to setup proper
>>>>> replication upstream for itself (in the case where it is transiting to
>>>>> Slave) or other replicas (in the case it is transiting to Master). So the
>>>>> participant needs to learn the ip:port for other replicas in the cluster.
>>>>> No concurrent transitions allowed for a partition will make it much 
>>>>> easier.
>>>>>
>>>>> 2. When a participant restarts, I assume it will connect to ZK with a
>>>>> new session id. With DelayedAutoRebalancer, helix will not move
>>>>> replicas away from the participants, but it will promote some Slave
>>>>> replicas on other hosts to be the new Masters. Once the restarted host is
>>>>> back, will helix send "OFFLINE -> SLAVE" transition requests to it for all
>>>>> the partitions that were on this participant before the restart?
>>>>>
>>>>> 3. When the ZK session is expired on a participant (no restart), helix
>>>>> will behave the same, i.e., sending "OFFLINE->SLAVE" for all partitions
>>>>> to the participant once it reconnect to ZK, right?
>>>>>
>>>>> Thanks,
>>>>> Bo
>>>>>
>>>>> On Tue, Jan 23, 2018 at 10:39 AM, kishore g <g.kish...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Relying on reuse of the same statemodel instance by Helix might make
>>>>>> it model too rigid and tied to current implementation in Helix. Let's not
>>>>>> expose that to the clients.
>>>>>>
>>>>>> Helix internally carries over the previous partitions assignment
>>>>>> during startup but sets the state to initial state (OFFLINE in this case)
>>>>>> by default. If the client really needs to know what was the previous 
>>>>>> state,
>>>>>> we can provide a hook to the client to compute the initial state. In any
>>>>>> case, lets hear more from Bo before making any changes.
>>>>>>
>>>>>> On Tue, Jan 23, 2018 at 9:19 AM, Lei Xia <l...@linkedin.com> wrote:
>>>>>>
>>>>>>> Hi, Bo
>>>>>>>
>>>>>>>
>>>>>>>   As Kishore commented, your offline->slave state transition
>>>>>>> callback needs some logic to determine whether a bootstrap or catch up 
>>>>>>> is
>>>>>>> needed to transit a replica to slave.  A common way is to persist the 
>>>>>>> data
>>>>>>> version of a local partition somewhere,  and during offline->slave,
>>>>>>> comparing local version (if there is) with current Master's version to
>>>>>>> determine if bootstrap (if version is null or too old) or catch-up is
>>>>>>> needed.
>>>>>>>
>>>>>>>
>>>>>>>   There is one more difference in how Helix handles participant
>>>>>>> restarting vs ZK session. When a participant starts (or restarts), it
>>>>>>> creates a new StateModel (by calling CreateStateModel() in your
>>>>>>> StateModelFactory) for each partition.  However, if a participant loses 
>>>>>>> ZK
>>>>>>> session and comes back (with new session), it will reuse the StateModel 
>>>>>>> for
>>>>>>> partitions that were there before instead of creating a new one.  You 
>>>>>>> may
>>>>>>> leverage this to tell whether a participant has been restarted or just
>>>>>>> re-established the ZK connection.
>>>>>>>
>>>>>>>
>>>>>>>   In addition, the Delayed feature in DelayedAutoRebalancer is a
>>>>>>> little different then what you may understand.  When you lose a 
>>>>>>> participant
>>>>>>> (e.g, crashed, in maintenance),  you lose one replica for some 
>>>>>>> partitions.
>>>>>>> In this situation, Helix will usually bring up a new replica in some 
>>>>>>> other
>>>>>>> live node immediately to maintain the required replica count.
>>>>>>> However, this may bring performance impact since bringing a new replica 
>>>>>>> can
>>>>>>> require data bootstrap in new node.  If you expect the original 
>>>>>>> participant
>>>>>>> will be back online soon and also you can tolerate losing one or more
>>>>>>> replica in short-term, then you can set a delay time here. In which 
>>>>>>> Helix
>>>>>>> will not bring a new replica before this time.  Hope that makes it more
>>>>>>> clear.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> Lei
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *Lei Xia*
>>>>>>>
>>>>>>>
>>>>>>> Data Infra/Helix
>>>>>>>
>>>>>>> l...@linkedin.com
>>>>>>> www.linkedin.com/in/lxia1
>>>>>>> ------------------------------
>>>>>>> *From:* Bo Liu <newpoo....@gmail.com>
>>>>>>> *Sent:* Monday, January 22, 2018 11:12:48 PM
>>>>>>> *To:* user@helix.apache.org
>>>>>>> *Subject:* differentiate between bootstrap and a soft failure
>>>>>>>
>>>>>>> Hi There,
>>>>>>>
>>>>>>> I am using FULL_AUTO with MasterSlave and DelayedAutoRebalancer. How
>>>>>>> can a participant differentiate between these two cases:
>>>>>>>
>>>>>>> 1) when a participant first joins a cluster, it will be requested to
>>>>>>> transit from OFFLINE to SLAVE. Since the participant doesn't have any 
>>>>>>> data
>>>>>>> for this partition, it needs to bootstrap and download data from another
>>>>>>> participant or a data source.
>>>>>>> 2) when a participant loses its ZK session, the controller will
>>>>>>> automatically change the participant to be OFFLINE in ZK. If the
>>>>>>> participant manages to establish a new session to ZK before the delayed
>>>>>>> time threshold, the controller will send a request to it to switch from
>>>>>>> OFFLINE to SLAVE. In this case, the participant already has the data for
>>>>>>> the partition, so it doesn't need to bootstrap from other data sources.
>>>>>>>
>>>>>>> --
>>>>>>> Best regards,
>>>>>>> Bo
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best regards,
>>>>> Bo
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Best regards,
>>> Bo
>>>
>>>
>>
>>
>> --
>> Best regards,
>> Bo
>>
>>
>

Re: differentiate between bootstrap and a soft failure

Reply via email to