1. Yes, you can set the max transitions constraint on per partition, per
   instance, per resource scope. There is a helix admin API to set the
   constraint. I dont have it handy.
   2.  Yes, Helix will send OFFLINE->SLAVE transitions to all partitions
   that were on the host and still present in the idealstate. If its removed
   from Idealstate, it will send OFFLINE->DROPPED transition.
   3. Right. Expiry is same as a restart. The only difference is
   with expiry, it calls reset method on the statemodel where one can plugin
   custom behavior.



On Tue, Jan 23, 2018 at 11:57 AM, Bo Liu <[email protected]> wrote:

> Thanks Kishore & Lei!
>
> It's a good point to rely on the data in a local partition to decide if a
> bootstrap is needed or catching up is good enough.'
>
> A few more questions.
>
> 1. is there a way to allow at most one transition for a partition at a
> time? During a state transition, a participant needs to setup proper
> replication upstream for itself (in the case where it is transiting to
> Slave) or other replicas (in the case it is transiting to Master). So the
> participant needs to learn the ip:port for other replicas in the cluster.
> No concurrent transitions allowed for a partition will make it much easier.
>
> 2. When a participant restarts, I assume it will connect to ZK with a new
> session id. With DelayedAutoRebalancer, helix will not move replicas away
> from the participants, but it will promote some Slave replicas on other
> hosts to be the new Masters. Once the restarted host is back, will helix
> send "OFFLINE -> SLAVE" transition requests to it for all the partitions
> that were on this participant before the restart?
>
> 3. When the ZK session is expired on a participant (no restart), helix
> will behave the same, i.e., sending "OFFLINE->SLAVE" for all partitions
> to the participant once it reconnect to ZK, right?
>
> Thanks,
> Bo
>
> On Tue, Jan 23, 2018 at 10:39 AM, kishore g <[email protected]> wrote:
>
>> Relying on reuse of the same statemodel instance by Helix might make it
>> model too rigid and tied to current implementation in Helix. Let's not
>> expose that to the clients.
>>
>> Helix internally carries over the previous partitions assignment during
>> startup but sets the state to initial state (OFFLINE in this case) by
>> default. If the client really needs to know what was the previous state, we
>> can provide a hook to the client to compute the initial state. In any case,
>> lets hear more from Bo before making any changes.
>>
>> On Tue, Jan 23, 2018 at 9:19 AM, Lei Xia <[email protected]> wrote:
>>
>>> Hi, Bo
>>>
>>>
>>>   As Kishore commented, your offline->slave state transition callback
>>> needs some logic to determine whether a bootstrap or catch up is needed to
>>> transit a replica to slave.  A common way is to persist the data version of
>>> a local partition somewhere,  and during offline->slave, comparing local
>>> version (if there is) with current Master's version to determine if
>>> bootstrap (if version is null or too old) or catch-up is needed.
>>>
>>>
>>>   There is one more difference in how Helix handles participant
>>> restarting vs ZK session. When a participant starts (or restarts), it
>>> creates a new StateModel (by calling CreateStateModel() in your
>>> StateModelFactory) for each partition.  However, if a participant loses ZK
>>> session and comes back (with new session), it will reuse the StateModel for
>>> partitions that were there before instead of creating a new one.  You may
>>> leverage this to tell whether a participant has been restarted or just
>>> re-established the ZK connection.
>>>
>>>
>>>   In addition, the Delayed feature in DelayedAutoRebalancer is a little
>>> different then what you may understand.  When you lose a participant (e.g,
>>> crashed, in maintenance),  you lose one replica for some partitions.  In
>>> this situation, Helix will usually bring up a new replica in some other
>>> live node immediately to maintain the required replica count.  However,
>>> this may bring performance impact since bringing a new replica can require
>>> data bootstrap in new node.  If you expect the original participant will be
>>> back online soon and also you can tolerate losing one or more replica in
>>> short-term, then you can set a delay time here. In which Helix will not
>>> bring a new replica before this time.  Hope that makes it more clear.
>>>
>>>
>>>
>>>
>>> Thanks
>>>
>>> Lei
>>>
>>>
>>>
>>>
>>> *Lei Xia*
>>>
>>>
>>> Data Infra/Helix
>>>
>>> [email protected]
>>> www.linkedin.com/in/lxia1
>>> ------------------------------
>>> *From:* Bo Liu <[email protected]>
>>> *Sent:* Monday, January 22, 2018 11:12:48 PM
>>> *To:* [email protected]
>>> *Subject:* differentiate between bootstrap and a soft failure
>>>
>>> Hi There,
>>>
>>> I am using FULL_AUTO with MasterSlave and DelayedAutoRebalancer. How can
>>> a participant differentiate between these two cases:
>>>
>>> 1) when a participant first joins a cluster, it will be requested to
>>> transit from OFFLINE to SLAVE. Since the participant doesn't have any data
>>> for this partition, it needs to bootstrap and download data from another
>>> participant or a data source.
>>> 2) when a participant loses its ZK session, the controller will
>>> automatically change the participant to be OFFLINE in ZK. If the
>>> participant manages to establish a new session to ZK before the delayed
>>> time threshold, the controller will send a request to it to switch from
>>> OFFLINE to SLAVE. In this case, the participant already has the data for
>>> the partition, so it doesn't need to bootstrap from other data sources.
>>>
>>> --
>>> Best regards,
>>> Bo
>>>
>>>
>>
>
>
> --
> Best regards,
> Bo
>
>

Reply via email to