Re: differentiate between bootstrap and a soft failure

Bo Liu Fri, 26 Jan 2018 13:23:10 -0800

After applying those changes, I am still observing the same behavior.


On Fri, Jan 26, 2018 at 11:50 AM, Bo Liu <[email protected]> wrote:

> Thanks Lei, will try it out.
> Yes, a tutorial page for this new feature would be very helpful.
>
> On Jan 26, 2018 09:49, "Lei Xia" <[email protected]> wrote:
>
>> Hi, Bo
>>
>>    That is not the expected behavior.   Would you add (or replace) the
>> following configs into your idealstate?   The ""MIN_ACTIVE_REPLICAS" tells
>> Helix the minimal replica it should maintain, for example, if your total
>> replica count is 3 and you lose 2 instances, Helix will bring at least 1
>> more replica online immediately irregarding of delayed setting to meet the
>> minimal replica requirement.
>>
>>     ,"REBALANCE_STRATEGY":"org.apache.helix.controller.rebalance
>> r.strategy.CrushRebalanceStrategy"
>>     , "MIN_ACTIVE_REPLICAS":"2"
>>     ,"REBALANCER_CLASS_NAME":"org.apache.helix.controller.rebala
>> ncer.DelayedAutoRebalancer"
>>
>> Also please add the following two configs into your ClusterConfig,
>> specially DELAY_REBALANCE_TIME specifies how long Helix should delay to
>> bring new replica, e.g, if an instance is down and does not come back after
>> 600000ms, Helix will move all replica on that instance to other live
>> instances.
>>
>> "DELAY_REBALANCE_ENABLED" : "true",
>>     "DELAY_REBALANCE_TIME" : "600000",
>>
>>
>> Please have a try and let us know how it works.  And apologize to not
>> have an updated manual on our website, we are working on updating all of
>> our developer manuals for all latest new features, it will be out soon.
>>
>>
>> Thanks
>> Lei
>>
>> On Thu, Jan 25, 2018 at 6:17 PM, Bo Liu <[email protected]> wrote:
>>
>>> I tried to run with DelayedAutoRebalancer. When a participant host
>>> (localhost_12913) was killed, shards hosted on it were not moved, which is
>>> expected.
>>> And the external view for the resource is like:
>>>
>>> ExternalView for test:
>>>
>>> {
>>>
>>>   "id" : "test",
>>>
>>>   "mapFields" : {
>>>
>>>     "test_0" : {
>>>
>>>       "localhost_12914" : "ONLINE"
>>>
>>>     },
>>>
>>>     "test_1" : {
>>>
>>>       "localhost_12914" : "ONLINE",
>>>
>>>       "localhost_12915" : "ONLINE"
>>>
>>>     },
>>>
>>>     "test_2" : {
>>>
>>>       "localhost_12915" : "ONLINE"
>>>
>>>     }
>>>
>>>   },
>>>
>>>   "listFields" : {
>>>
>>>   },
>>>
>>>   "simpleFields" : {
>>>
>>>     "BUCKET_SIZE" : "0",
>>>
>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>
>>>     "NUM_PARTITIONS" : "3",
>>>
>>>     "REBALANCER_CLASS_NAME" : "DelayedAutoRebalancer",
>>>
>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>
>>>     "REBALANCE_STRATEGY" : "AutoRebalanceStrategy",
>>>
>>>     "REPLICAS" : "2",
>>>
>>>     "STATE_MODEL_DEF_REF" : "OnlineOffline",
>>>
>>>     "STATE_MODEL_FACTORY_NAME" : "DEFAULT"
>>>
>>>   }
>>>
>>> }
>>>
>>> However, when I restarted the participant, the participant didn't get
>>> any new transition requests and the external view became:
>>>
>>> ExternalView for test:
>>>
>>> {
>>>
>>>   "id" : "test",
>>>
>>>   "mapFields" : {
>>>
>>>     "test_0" : {
>>>
>>>       "localhost_12913" : "OFFLINE",
>>>
>>>       "localhost_12914" : "ONLINE"
>>>
>>>     },
>>>
>>>     "test_1" : {
>>>
>>>       "localhost_12914" : "ONLINE",
>>>
>>>       "localhost_12915" : "ONLINE"
>>>
>>>     },
>>>
>>>     "test_2" : {
>>>
>>>       "localhost_12913" : "OFFLINE",
>>>
>>>       "localhost_12915" : "ONLINE"
>>>
>>>     }
>>>
>>>   },
>>>
>>>   "listFields" : {
>>>
>>>   },
>>>
>>>   "simpleFields" : {
>>>
>>>     "BUCKET_SIZE" : "0",
>>>
>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>
>>>     "NUM_PARTITIONS" : "3",
>>>
>>>     "REBALANCER_CLASS_NAME" : "DelayedAutoRebalancer",
>>>
>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>
>>>     "REBALANCE_STRATEGY" : "AutoRebalanceStrategy",
>>>
>>>     "REPLICAS" : "2",
>>>
>>>     "STATE_MODEL_DEF_REF" : "OnlineOffline",
>>>
>>>     "STATE_MODEL_FACTORY_NAME" : "DEFAULT"
>>>
>>>   }
>>>
>>> }
>>>
>>> I am wondering if this is the expected behavior?
>>>
>>>
>>>
>>> On Tue, Jan 23, 2018 at 2:38 PM, Bo Liu <[email protected]> wrote:
>>>
>>>> Great, thank you for the prompt reply.
>>>>
>>>> Thanks,
>>>> Bo
>>>>
>>>> On Tue, Jan 23, 2018 at 1:47 PM, kishore g <[email protected]> wrote:
>>>>
>>>>>
>>>>>    1. Yes, you can set the max transitions constraint on per
>>>>>    partition, per instance, per resource scope. There is a helix admin 
>>>>> API to
>>>>>    set the constraint. I dont have it handy.
>>>>>    2.  Yes, Helix will send OFFLINE->SLAVE transitions to all
>>>>>    partitions that were on the host and still present in the idealstate. 
>>>>> If
>>>>>    its removed from Idealstate, it will send OFFLINE->DROPPED transition.
>>>>>    3. Right. Expiry is same as a restart. The only difference is
>>>>>    with expiry, it calls reset method on the statemodel where one can 
>>>>> plugin
>>>>>    custom behavior.
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Jan 23, 2018 at 11:57 AM, Bo Liu <[email protected]> wrote:
>>>>>
>>>>>> Thanks Kishore & Lei!
>>>>>>
>>>>>> It's a good point to rely on the data in a local partition to decide
>>>>>> if a bootstrap is needed or catching up is good enough.'
>>>>>>
>>>>>> A few more questions.
>>>>>>
>>>>>> 1. is there a way to allow at most one transition for a partition at
>>>>>> a time? During a state transition, a participant needs to setup proper
>>>>>> replication upstream for itself (in the case where it is transiting to
>>>>>> Slave) or other replicas (in the case it is transiting to Master). So the
>>>>>> participant needs to learn the ip:port for other replicas in the cluster.
>>>>>> No concurrent transitions allowed for a partition will make it much 
>>>>>> easier.
>>>>>>
>>>>>> 2. When a participant restarts, I assume it will connect to ZK with a
>>>>>> new session id. With DelayedAutoRebalancer, helix will not move
>>>>>> replicas away from the participants, but it will promote some Slave
>>>>>> replicas on other hosts to be the new Masters. Once the restarted host is
>>>>>> back, will helix send "OFFLINE -> SLAVE" transition requests to it for 
>>>>>> all
>>>>>> the partitions that were on this participant before the restart?
>>>>>>
>>>>>> 3. When the ZK session is expired on a participant (no restart),
>>>>>> helix will behave the same, i.e., sending "OFFLINE->SLAVE" for all
>>>>>> partitions to the participant once it reconnect to ZK, right?
>>>>>>
>>>>>> Thanks,
>>>>>> Bo
>>>>>>
>>>>>> On Tue, Jan 23, 2018 at 10:39 AM, kishore g <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Relying on reuse of the same statemodel instance by Helix might make
>>>>>>> it model too rigid and tied to current implementation in Helix. Let's 
>>>>>>> not
>>>>>>> expose that to the clients.
>>>>>>>
>>>>>>> Helix internally carries over the previous partitions assignment
>>>>>>> during startup but sets the state to initial state (OFFLINE in this 
>>>>>>> case)
>>>>>>> by default. If the client really needs to know what was the previous 
>>>>>>> state,
>>>>>>> we can provide a hook to the client to compute the initial state. In any
>>>>>>> case, lets hear more from Bo before making any changes.
>>>>>>>
>>>>>>> On Tue, Jan 23, 2018 at 9:19 AM, Lei Xia <[email protected]> wrote:
>>>>>>>
>>>>>>>> Hi, Bo
>>>>>>>>
>>>>>>>>
>>>>>>>>   As Kishore commented, your offline->slave state transition
>>>>>>>> callback needs some logic to determine whether a bootstrap or catch up 
>>>>>>>> is
>>>>>>>> needed to transit a replica to slave.  A common way is to persist the 
>>>>>>>> data
>>>>>>>> version of a local partition somewhere,  and during offline->slave,
>>>>>>>> comparing local version (if there is) with current Master's version to
>>>>>>>> determine if bootstrap (if version is null or too old) or catch-up is
>>>>>>>> needed.
>>>>>>>>
>>>>>>>>
>>>>>>>>   There is one more difference in how Helix handles participant
>>>>>>>> restarting vs ZK session. When a participant starts (or restarts), it
>>>>>>>> creates a new StateModel (by calling CreateStateModel() in your
>>>>>>>> StateModelFactory) for each partition.  However, if a participant 
>>>>>>>> loses ZK
>>>>>>>> session and comes back (with new session), it will reuse the 
>>>>>>>> StateModel for
>>>>>>>> partitions that were there before instead of creating a new one.  You 
>>>>>>>> may
>>>>>>>> leverage this to tell whether a participant has been restarted or just
>>>>>>>> re-established the ZK connection.
>>>>>>>>
>>>>>>>>
>>>>>>>>   In addition, the Delayed feature in DelayedAutoRebalancer is a
>>>>>>>> little different then what you may understand.  When you lose a 
>>>>>>>> participant
>>>>>>>> (e.g, crashed, in maintenance),  you lose one replica for some 
>>>>>>>> partitions.
>>>>>>>> In this situation, Helix will usually bring up a new replica in some 
>>>>>>>> other
>>>>>>>> live node immediately to maintain the required replica count.
>>>>>>>> However, this may bring performance impact since bringing a new 
>>>>>>>> replica can
>>>>>>>> require data bootstrap in new node.  If you expect the original 
>>>>>>>> participant
>>>>>>>> will be back online soon and also you can tolerate losing one or more
>>>>>>>> replica in short-term, then you can set a delay time here. In which 
>>>>>>>> Helix
>>>>>>>> will not bring a new replica before this time.  Hope that makes it more
>>>>>>>> clear.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>> Lei
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> *Lei Xia*
>>>>>>>>
>>>>>>>>
>>>>>>>> Data Infra/Helix
>>>>>>>>
>>>>>>>> [email protected]
>>>>>>>> www.linkedin.com/in/lxia1
>>>>>>>> ------------------------------
>>>>>>>> *From:* Bo Liu <[email protected]>
>>>>>>>> *Sent:* Monday, January 22, 2018 11:12:48 PM
>>>>>>>> *To:* [email protected]
>>>>>>>> *Subject:* differentiate between bootstrap and a soft failure
>>>>>>>>
>>>>>>>> Hi There,
>>>>>>>>
>>>>>>>> I am using FULL_AUTO with MasterSlave and DelayedAutoRebalancer.
>>>>>>>> How can a participant differentiate between these two cases:
>>>>>>>>
>>>>>>>> 1) when a participant first joins a cluster, it will be requested
>>>>>>>> to transit from OFFLINE to SLAVE. Since the participant doesn't have 
>>>>>>>> any
>>>>>>>> data for this partition, it needs to bootstrap and download data from
>>>>>>>> another participant or a data source.
>>>>>>>> 2) when a participant loses its ZK session, the controller will
>>>>>>>> automatically change the participant to be OFFLINE in ZK. If the
>>>>>>>> participant manages to establish a new session to ZK before the delayed
>>>>>>>> time threshold, the controller will send a request to it to switch from
>>>>>>>> OFFLINE to SLAVE. In this case, the participant already has the data 
>>>>>>>> for
>>>>>>>> the partition, so it doesn't need to bootstrap from other data sources.
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best regards,
>>>>>>>> Bo
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best regards,
>>>>>> Bo
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Best regards,
>>>> Bo
>>>>
>>>>
>>>
>>>
>>> --
>>> Best regards,
>>> Bo
>>>
>>>
>>


-- 
Best regards,
Bo

Re: differentiate between bootstrap and a soft failure

Reply via email to