1. Yes, you can set the max transitions constraint on per partition, per instance, per resource scope. There is a helix admin API to set the constraint. I dont have it handy. 2. Yes, Helix will send OFFLINE->SLAVE transitions to all partitions that were on the host and still present in the idealstate. If its removed from Idealstate, it will send OFFLINE->DROPPED transition. 3. Right. Expiry is same as a restart. The only difference is with expiry, it calls reset method on the statemodel where one can plugin custom behavior.
On Tue, Jan 23, 2018 at 11:57 AM, Bo Liu <[email protected]> wrote: > Thanks Kishore & Lei! > > It's a good point to rely on the data in a local partition to decide if a > bootstrap is needed or catching up is good enough.' > > A few more questions. > > 1. is there a way to allow at most one transition for a partition at a > time? During a state transition, a participant needs to setup proper > replication upstream for itself (in the case where it is transiting to > Slave) or other replicas (in the case it is transiting to Master). So the > participant needs to learn the ip:port for other replicas in the cluster. > No concurrent transitions allowed for a partition will make it much easier. > > 2. When a participant restarts, I assume it will connect to ZK with a new > session id. With DelayedAutoRebalancer, helix will not move replicas away > from the participants, but it will promote some Slave replicas on other > hosts to be the new Masters. Once the restarted host is back, will helix > send "OFFLINE -> SLAVE" transition requests to it for all the partitions > that were on this participant before the restart? > > 3. When the ZK session is expired on a participant (no restart), helix > will behave the same, i.e., sending "OFFLINE->SLAVE" for all partitions > to the participant once it reconnect to ZK, right? > > Thanks, > Bo > > On Tue, Jan 23, 2018 at 10:39 AM, kishore g <[email protected]> wrote: > >> Relying on reuse of the same statemodel instance by Helix might make it >> model too rigid and tied to current implementation in Helix. Let's not >> expose that to the clients. >> >> Helix internally carries over the previous partitions assignment during >> startup but sets the state to initial state (OFFLINE in this case) by >> default. If the client really needs to know what was the previous state, we >> can provide a hook to the client to compute the initial state. In any case, >> lets hear more from Bo before making any changes. >> >> On Tue, Jan 23, 2018 at 9:19 AM, Lei Xia <[email protected]> wrote: >> >>> Hi, Bo >>> >>> >>> As Kishore commented, your offline->slave state transition callback >>> needs some logic to determine whether a bootstrap or catch up is needed to >>> transit a replica to slave. A common way is to persist the data version of >>> a local partition somewhere, and during offline->slave, comparing local >>> version (if there is) with current Master's version to determine if >>> bootstrap (if version is null or too old) or catch-up is needed. >>> >>> >>> There is one more difference in how Helix handles participant >>> restarting vs ZK session. When a participant starts (or restarts), it >>> creates a new StateModel (by calling CreateStateModel() in your >>> StateModelFactory) for each partition. However, if a participant loses ZK >>> session and comes back (with new session), it will reuse the StateModel for >>> partitions that were there before instead of creating a new one. You may >>> leverage this to tell whether a participant has been restarted or just >>> re-established the ZK connection. >>> >>> >>> In addition, the Delayed feature in DelayedAutoRebalancer is a little >>> different then what you may understand. When you lose a participant (e.g, >>> crashed, in maintenance), you lose one replica for some partitions. In >>> this situation, Helix will usually bring up a new replica in some other >>> live node immediately to maintain the required replica count. However, >>> this may bring performance impact since bringing a new replica can require >>> data bootstrap in new node. If you expect the original participant will be >>> back online soon and also you can tolerate losing one or more replica in >>> short-term, then you can set a delay time here. In which Helix will not >>> bring a new replica before this time. Hope that makes it more clear. >>> >>> >>> >>> >>> Thanks >>> >>> Lei >>> >>> >>> >>> >>> *Lei Xia* >>> >>> >>> Data Infra/Helix >>> >>> [email protected] >>> www.linkedin.com/in/lxia1 >>> ------------------------------ >>> *From:* Bo Liu <[email protected]> >>> *Sent:* Monday, January 22, 2018 11:12:48 PM >>> *To:* [email protected] >>> *Subject:* differentiate between bootstrap and a soft failure >>> >>> Hi There, >>> >>> I am using FULL_AUTO with MasterSlave and DelayedAutoRebalancer. How can >>> a participant differentiate between these two cases: >>> >>> 1) when a participant first joins a cluster, it will be requested to >>> transit from OFFLINE to SLAVE. Since the participant doesn't have any data >>> for this partition, it needs to bootstrap and download data from another >>> participant or a data source. >>> 2) when a participant loses its ZK session, the controller will >>> automatically change the participant to be OFFLINE in ZK. If the >>> participant manages to establish a new session to ZK before the delayed >>> time threshold, the controller will send a request to it to switch from >>> OFFLINE to SLAVE. In this case, the participant already has the data for >>> the partition, so it doesn't need to bootstrap from other data sources. >>> >>> -- >>> Best regards, >>> Bo >>> >>> >> > > > -- > Best regards, > Bo > >
