Thanks Lei, will try it out. Yes, a tutorial page for this new feature would be very helpful.
On Jan 26, 2018 09:49, "Lei Xia" <l...@apache.org> wrote: > Hi, Bo > > That is not the expected behavior. Would you add (or replace) the > following configs into your idealstate? The ""MIN_ACTIVE_REPLICAS" tells > Helix the minimal replica it should maintain, for example, if your total > replica count is 3 and you lose 2 instances, Helix will bring at least 1 > more replica online immediately irregarding of delayed setting to meet the > minimal replica requirement. > > ,"REBALANCE_STRATEGY":"org.apache.helix.controller. > rebalancer.strategy.CrushRebalanceStrategy" > , "MIN_ACTIVE_REPLICAS":"2" > ,"REBALANCER_CLASS_NAME":"org.apache.helix.controller.rebalancer. > DelayedAutoRebalancer" > > Also please add the following two configs into your ClusterConfig, > specially DELAY_REBALANCE_TIME specifies how long Helix should delay to > bring new replica, e.g, if an instance is down and does not come back after > 600000ms, Helix will move all replica on that instance to other live > instances. > > "DELAY_REBALANCE_ENABLED" : "true", > "DELAY_REBALANCE_TIME" : "600000", > > > Please have a try and let us know how it works. And apologize to not have > an updated manual on our website, we are working on updating all of our > developer manuals for all latest new features, it will be out soon. > > > Thanks > Lei > > On Thu, Jan 25, 2018 at 6:17 PM, Bo Liu <newpoo....@gmail.com> wrote: > >> I tried to run with DelayedAutoRebalancer. When a participant host >> (localhost_12913) was killed, shards hosted on it were not moved, which is >> expected. >> And the external view for the resource is like: >> >> ExternalView for test: >> >> { >> >> "id" : "test", >> >> "mapFields" : { >> >> "test_0" : { >> >> "localhost_12914" : "ONLINE" >> >> }, >> >> "test_1" : { >> >> "localhost_12914" : "ONLINE", >> >> "localhost_12915" : "ONLINE" >> >> }, >> >> "test_2" : { >> >> "localhost_12915" : "ONLINE" >> >> } >> >> }, >> >> "listFields" : { >> >> }, >> >> "simpleFields" : { >> >> "BUCKET_SIZE" : "0", >> >> "IDEAL_STATE_MODE" : "AUTO_REBALANCE", >> >> "NUM_PARTITIONS" : "3", >> >> "REBALANCER_CLASS_NAME" : "DelayedAutoRebalancer", >> >> "REBALANCE_MODE" : "FULL_AUTO", >> >> "REBALANCE_STRATEGY" : "AutoRebalanceStrategy", >> >> "REPLICAS" : "2", >> >> "STATE_MODEL_DEF_REF" : "OnlineOffline", >> >> "STATE_MODEL_FACTORY_NAME" : "DEFAULT" >> >> } >> >> } >> >> However, when I restarted the participant, the participant didn't get any >> new transition requests and the external view became: >> >> ExternalView for test: >> >> { >> >> "id" : "test", >> >> "mapFields" : { >> >> "test_0" : { >> >> "localhost_12913" : "OFFLINE", >> >> "localhost_12914" : "ONLINE" >> >> }, >> >> "test_1" : { >> >> "localhost_12914" : "ONLINE", >> >> "localhost_12915" : "ONLINE" >> >> }, >> >> "test_2" : { >> >> "localhost_12913" : "OFFLINE", >> >> "localhost_12915" : "ONLINE" >> >> } >> >> }, >> >> "listFields" : { >> >> }, >> >> "simpleFields" : { >> >> "BUCKET_SIZE" : "0", >> >> "IDEAL_STATE_MODE" : "AUTO_REBALANCE", >> >> "NUM_PARTITIONS" : "3", >> >> "REBALANCER_CLASS_NAME" : "DelayedAutoRebalancer", >> >> "REBALANCE_MODE" : "FULL_AUTO", >> >> "REBALANCE_STRATEGY" : "AutoRebalanceStrategy", >> >> "REPLICAS" : "2", >> >> "STATE_MODEL_DEF_REF" : "OnlineOffline", >> >> "STATE_MODEL_FACTORY_NAME" : "DEFAULT" >> >> } >> >> } >> >> I am wondering if this is the expected behavior? >> >> >> >> On Tue, Jan 23, 2018 at 2:38 PM, Bo Liu <newpoo....@gmail.com> wrote: >> >>> Great, thank you for the prompt reply. >>> >>> Thanks, >>> Bo >>> >>> On Tue, Jan 23, 2018 at 1:47 PM, kishore g <g.kish...@gmail.com> wrote: >>> >>>> >>>> 1. Yes, you can set the max transitions constraint on per >>>> partition, per instance, per resource scope. There is a helix admin API >>>> to >>>> set the constraint. I dont have it handy. >>>> 2. Yes, Helix will send OFFLINE->SLAVE transitions to all >>>> partitions that were on the host and still present in the idealstate. If >>>> its removed from Idealstate, it will send OFFLINE->DROPPED transition. >>>> 3. Right. Expiry is same as a restart. The only difference is >>>> with expiry, it calls reset method on the statemodel where one can >>>> plugin >>>> custom behavior. >>>> >>>> >>>> >>>> On Tue, Jan 23, 2018 at 11:57 AM, Bo Liu <newpoo....@gmail.com> wrote: >>>> >>>>> Thanks Kishore & Lei! >>>>> >>>>> It's a good point to rely on the data in a local partition to decide >>>>> if a bootstrap is needed or catching up is good enough.' >>>>> >>>>> A few more questions. >>>>> >>>>> 1. is there a way to allow at most one transition for a partition at a >>>>> time? During a state transition, a participant needs to setup proper >>>>> replication upstream for itself (in the case where it is transiting to >>>>> Slave) or other replicas (in the case it is transiting to Master). So the >>>>> participant needs to learn the ip:port for other replicas in the cluster. >>>>> No concurrent transitions allowed for a partition will make it much >>>>> easier. >>>>> >>>>> 2. When a participant restarts, I assume it will connect to ZK with a >>>>> new session id. With DelayedAutoRebalancer, helix will not move >>>>> replicas away from the participants, but it will promote some Slave >>>>> replicas on other hosts to be the new Masters. Once the restarted host is >>>>> back, will helix send "OFFLINE -> SLAVE" transition requests to it for all >>>>> the partitions that were on this participant before the restart? >>>>> >>>>> 3. When the ZK session is expired on a participant (no restart), helix >>>>> will behave the same, i.e., sending "OFFLINE->SLAVE" for all partitions >>>>> to the participant once it reconnect to ZK, right? >>>>> >>>>> Thanks, >>>>> Bo >>>>> >>>>> On Tue, Jan 23, 2018 at 10:39 AM, kishore g <g.kish...@gmail.com> >>>>> wrote: >>>>> >>>>>> Relying on reuse of the same statemodel instance by Helix might make >>>>>> it model too rigid and tied to current implementation in Helix. Let's not >>>>>> expose that to the clients. >>>>>> >>>>>> Helix internally carries over the previous partitions assignment >>>>>> during startup but sets the state to initial state (OFFLINE in this case) >>>>>> by default. If the client really needs to know what was the previous >>>>>> state, >>>>>> we can provide a hook to the client to compute the initial state. In any >>>>>> case, lets hear more from Bo before making any changes. >>>>>> >>>>>> On Tue, Jan 23, 2018 at 9:19 AM, Lei Xia <l...@linkedin.com> wrote: >>>>>> >>>>>>> Hi, Bo >>>>>>> >>>>>>> >>>>>>> As Kishore commented, your offline->slave state transition >>>>>>> callback needs some logic to determine whether a bootstrap or catch up >>>>>>> is >>>>>>> needed to transit a replica to slave. A common way is to persist the >>>>>>> data >>>>>>> version of a local partition somewhere, and during offline->slave, >>>>>>> comparing local version (if there is) with current Master's version to >>>>>>> determine if bootstrap (if version is null or too old) or catch-up is >>>>>>> needed. >>>>>>> >>>>>>> >>>>>>> There is one more difference in how Helix handles participant >>>>>>> restarting vs ZK session. When a participant starts (or restarts), it >>>>>>> creates a new StateModel (by calling CreateStateModel() in your >>>>>>> StateModelFactory) for each partition. However, if a participant loses >>>>>>> ZK >>>>>>> session and comes back (with new session), it will reuse the StateModel >>>>>>> for >>>>>>> partitions that were there before instead of creating a new one. You >>>>>>> may >>>>>>> leverage this to tell whether a participant has been restarted or just >>>>>>> re-established the ZK connection. >>>>>>> >>>>>>> >>>>>>> In addition, the Delayed feature in DelayedAutoRebalancer is a >>>>>>> little different then what you may understand. When you lose a >>>>>>> participant >>>>>>> (e.g, crashed, in maintenance), you lose one replica for some >>>>>>> partitions. >>>>>>> In this situation, Helix will usually bring up a new replica in some >>>>>>> other >>>>>>> live node immediately to maintain the required replica count. >>>>>>> However, this may bring performance impact since bringing a new replica >>>>>>> can >>>>>>> require data bootstrap in new node. If you expect the original >>>>>>> participant >>>>>>> will be back online soon and also you can tolerate losing one or more >>>>>>> replica in short-term, then you can set a delay time here. In which >>>>>>> Helix >>>>>>> will not bring a new replica before this time. Hope that makes it more >>>>>>> clear. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Thanks >>>>>>> >>>>>>> Lei >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> *Lei Xia* >>>>>>> >>>>>>> >>>>>>> Data Infra/Helix >>>>>>> >>>>>>> l...@linkedin.com >>>>>>> www.linkedin.com/in/lxia1 >>>>>>> ------------------------------ >>>>>>> *From:* Bo Liu <newpoo....@gmail.com> >>>>>>> *Sent:* Monday, January 22, 2018 11:12:48 PM >>>>>>> *To:* user@helix.apache.org >>>>>>> *Subject:* differentiate between bootstrap and a soft failure >>>>>>> >>>>>>> Hi There, >>>>>>> >>>>>>> I am using FULL_AUTO with MasterSlave and DelayedAutoRebalancer. How >>>>>>> can a participant differentiate between these two cases: >>>>>>> >>>>>>> 1) when a participant first joins a cluster, it will be requested to >>>>>>> transit from OFFLINE to SLAVE. Since the participant doesn't have any >>>>>>> data >>>>>>> for this partition, it needs to bootstrap and download data from another >>>>>>> participant or a data source. >>>>>>> 2) when a participant loses its ZK session, the controller will >>>>>>> automatically change the participant to be OFFLINE in ZK. If the >>>>>>> participant manages to establish a new session to ZK before the delayed >>>>>>> time threshold, the controller will send a request to it to switch from >>>>>>> OFFLINE to SLAVE. In this case, the participant already has the data for >>>>>>> the partition, so it doesn't need to bootstrap from other data sources. >>>>>>> >>>>>>> -- >>>>>>> Best regards, >>>>>>> Bo >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Best regards, >>>>> Bo >>>>> >>>>> >>>> >>> >>> >>> -- >>> Best regards, >>> Bo >>> >>> >> >> >> -- >> Best regards, >> Bo >> >> >