Thanks Kishore & Lei!

It's a good point to rely on the data in a local partition to decide if a
bootstrap is needed or catching up is good enough.'

A few more questions.

1. is there a way to allow at most one transition for a partition at a
time? During a state transition, a participant needs to setup proper
replication upstream for itself (in the case where it is transiting to
Slave) or other replicas (in the case it is transiting to Master). So the
participant needs to learn the ip:port for other replicas in the cluster.
No concurrent transitions allowed for a partition will make it much easier.

2. When a participant restarts, I assume it will connect to ZK with a new
session id. With DelayedAutoRebalancer, helix will not move replicas away
from the participants, but it will promote some Slave replicas on other
hosts to be the new Masters. Once the restarted host is back, will helix
send "OFFLINE -> SLAVE" transition requests to it for all the partitions
that were on this participant before the restart?

3. When the ZK session is expired on a participant (no restart), helix will
behave the same, i.e., sending "OFFLINE->SLAVE" for all partitions to the
participant once it reconnect to ZK, right?


On Tue, Jan 23, 2018 at 10:39 AM, kishore g <> wrote:

> Relying on reuse of the same statemodel instance by Helix might make it
> model too rigid and tied to current implementation in Helix. Let's not
> expose that to the clients.
> Helix internally carries over the previous partitions assignment during
> startup but sets the state to initial state (OFFLINE in this case) by
> default. If the client really needs to know what was the previous state, we
> can provide a hook to the client to compute the initial state. In any case,
> lets hear more from Bo before making any changes.
> On Tue, Jan 23, 2018 at 9:19 AM, Lei Xia <> wrote:
>> Hi, Bo
>>   As Kishore commented, your offline->slave state transition callback
>> needs some logic to determine whether a bootstrap or catch up is needed to
>> transit a replica to slave.  A common way is to persist the data version of
>> a local partition somewhere,  and during offline->slave, comparing local
>> version (if there is) with current Master's version to determine if
>> bootstrap (if version is null or too old) or catch-up is needed.
>>   There is one more difference in how Helix handles participant
>> restarting vs ZK session. When a participant starts (or restarts), it
>> creates a new StateModel (by calling CreateStateModel() in your
>> StateModelFactory) for each partition.  However, if a participant loses ZK
>> session and comes back (with new session), it will reuse the StateModel for
>> partitions that were there before instead of creating a new one.  You may
>> leverage this to tell whether a participant has been restarted or just
>> re-established the ZK connection.
>>   In addition, the Delayed feature in DelayedAutoRebalancer is a little
>> different then what you may understand.  When you lose a participant (e.g,
>> crashed, in maintenance),  you lose one replica for some partitions.  In
>> this situation, Helix will usually bring up a new replica in some other
>> live node immediately to maintain the required replica count.  However,
>> this may bring performance impact since bringing a new replica can require
>> data bootstrap in new node.  If you expect the original participant will be
>> back online soon and also you can tolerate losing one or more replica in
>> short-term, then you can set a delay time here. In which Helix will not
>> bring a new replica before this time.  Hope that makes it more clear.
>> Thanks
>> Lei
>> *Lei Xia*
>> Data Infra/Helix
>> ------------------------------
>> *From:* Bo Liu <>
>> *Sent:* Monday, January 22, 2018 11:12:48 PM
>> *To:*
>> *Subject:* differentiate between bootstrap and a soft failure
>> Hi There,
>> I am using FULL_AUTO with MasterSlave and DelayedAutoRebalancer. How can
>> a participant differentiate between these two cases:
>> 1) when a participant first joins a cluster, it will be requested to
>> transit from OFFLINE to SLAVE. Since the participant doesn't have any data
>> for this partition, it needs to bootstrap and download data from another
>> participant or a data source.
>> 2) when a participant loses its ZK session, the controller will
>> automatically change the participant to be OFFLINE in ZK. If the
>> participant manages to establish a new session to ZK before the delayed
>> time threshold, the controller will send a request to it to switch from
>> OFFLINE to SLAVE. In this case, the participant already has the data for
>> the partition, so it doesn't need to bootstrap from other data sources.
>> --
>> Best regards,
>> Bo

Best regards,

Reply via email to