After applying those changes, I am still observing the same behavior.
On Fri, Jan 26, 2018 at 11:50 AM, Bo Liu <[email protected]> wrote: > Thanks Lei, will try it out. > Yes, a tutorial page for this new feature would be very helpful. > > On Jan 26, 2018 09:49, "Lei Xia" <[email protected]> wrote: > >> Hi, Bo >> >> That is not the expected behavior. Would you add (or replace) the >> following configs into your idealstate? The ""MIN_ACTIVE_REPLICAS" tells >> Helix the minimal replica it should maintain, for example, if your total >> replica count is 3 and you lose 2 instances, Helix will bring at least 1 >> more replica online immediately irregarding of delayed setting to meet the >> minimal replica requirement. >> >> ,"REBALANCE_STRATEGY":"org.apache.helix.controller.rebalance >> r.strategy.CrushRebalanceStrategy" >> , "MIN_ACTIVE_REPLICAS":"2" >> ,"REBALANCER_CLASS_NAME":"org.apache.helix.controller.rebala >> ncer.DelayedAutoRebalancer" >> >> Also please add the following two configs into your ClusterConfig, >> specially DELAY_REBALANCE_TIME specifies how long Helix should delay to >> bring new replica, e.g, if an instance is down and does not come back after >> 600000ms, Helix will move all replica on that instance to other live >> instances. >> >> "DELAY_REBALANCE_ENABLED" : "true", >> "DELAY_REBALANCE_TIME" : "600000", >> >> >> Please have a try and let us know how it works. And apologize to not >> have an updated manual on our website, we are working on updating all of >> our developer manuals for all latest new features, it will be out soon. >> >> >> Thanks >> Lei >> >> On Thu, Jan 25, 2018 at 6:17 PM, Bo Liu <[email protected]> wrote: >> >>> I tried to run with DelayedAutoRebalancer. When a participant host >>> (localhost_12913) was killed, shards hosted on it were not moved, which is >>> expected. >>> And the external view for the resource is like: >>> >>> ExternalView for test: >>> >>> { >>> >>> "id" : "test", >>> >>> "mapFields" : { >>> >>> "test_0" : { >>> >>> "localhost_12914" : "ONLINE" >>> >>> }, >>> >>> "test_1" : { >>> >>> "localhost_12914" : "ONLINE", >>> >>> "localhost_12915" : "ONLINE" >>> >>> }, >>> >>> "test_2" : { >>> >>> "localhost_12915" : "ONLINE" >>> >>> } >>> >>> }, >>> >>> "listFields" : { >>> >>> }, >>> >>> "simpleFields" : { >>> >>> "BUCKET_SIZE" : "0", >>> >>> "IDEAL_STATE_MODE" : "AUTO_REBALANCE", >>> >>> "NUM_PARTITIONS" : "3", >>> >>> "REBALANCER_CLASS_NAME" : "DelayedAutoRebalancer", >>> >>> "REBALANCE_MODE" : "FULL_AUTO", >>> >>> "REBALANCE_STRATEGY" : "AutoRebalanceStrategy", >>> >>> "REPLICAS" : "2", >>> >>> "STATE_MODEL_DEF_REF" : "OnlineOffline", >>> >>> "STATE_MODEL_FACTORY_NAME" : "DEFAULT" >>> >>> } >>> >>> } >>> >>> However, when I restarted the participant, the participant didn't get >>> any new transition requests and the external view became: >>> >>> ExternalView for test: >>> >>> { >>> >>> "id" : "test", >>> >>> "mapFields" : { >>> >>> "test_0" : { >>> >>> "localhost_12913" : "OFFLINE", >>> >>> "localhost_12914" : "ONLINE" >>> >>> }, >>> >>> "test_1" : { >>> >>> "localhost_12914" : "ONLINE", >>> >>> "localhost_12915" : "ONLINE" >>> >>> }, >>> >>> "test_2" : { >>> >>> "localhost_12913" : "OFFLINE", >>> >>> "localhost_12915" : "ONLINE" >>> >>> } >>> >>> }, >>> >>> "listFields" : { >>> >>> }, >>> >>> "simpleFields" : { >>> >>> "BUCKET_SIZE" : "0", >>> >>> "IDEAL_STATE_MODE" : "AUTO_REBALANCE", >>> >>> "NUM_PARTITIONS" : "3", >>> >>> "REBALANCER_CLASS_NAME" : "DelayedAutoRebalancer", >>> >>> "REBALANCE_MODE" : "FULL_AUTO", >>> >>> "REBALANCE_STRATEGY" : "AutoRebalanceStrategy", >>> >>> "REPLICAS" : "2", >>> >>> "STATE_MODEL_DEF_REF" : "OnlineOffline", >>> >>> "STATE_MODEL_FACTORY_NAME" : "DEFAULT" >>> >>> } >>> >>> } >>> >>> I am wondering if this is the expected behavior? >>> >>> >>> >>> On Tue, Jan 23, 2018 at 2:38 PM, Bo Liu <[email protected]> wrote: >>> >>>> Great, thank you for the prompt reply. >>>> >>>> Thanks, >>>> Bo >>>> >>>> On Tue, Jan 23, 2018 at 1:47 PM, kishore g <[email protected]> wrote: >>>> >>>>> >>>>> 1. Yes, you can set the max transitions constraint on per >>>>> partition, per instance, per resource scope. There is a helix admin >>>>> API to >>>>> set the constraint. I dont have it handy. >>>>> 2. Yes, Helix will send OFFLINE->SLAVE transitions to all >>>>> partitions that were on the host and still present in the idealstate. >>>>> If >>>>> its removed from Idealstate, it will send OFFLINE->DROPPED transition. >>>>> 3. Right. Expiry is same as a restart. The only difference is >>>>> with expiry, it calls reset method on the statemodel where one can >>>>> plugin >>>>> custom behavior. >>>>> >>>>> >>>>> >>>>> On Tue, Jan 23, 2018 at 11:57 AM, Bo Liu <[email protected]> wrote: >>>>> >>>>>> Thanks Kishore & Lei! >>>>>> >>>>>> It's a good point to rely on the data in a local partition to decide >>>>>> if a bootstrap is needed or catching up is good enough.' >>>>>> >>>>>> A few more questions. >>>>>> >>>>>> 1. is there a way to allow at most one transition for a partition at >>>>>> a time? During a state transition, a participant needs to setup proper >>>>>> replication upstream for itself (in the case where it is transiting to >>>>>> Slave) or other replicas (in the case it is transiting to Master). So the >>>>>> participant needs to learn the ip:port for other replicas in the cluster. >>>>>> No concurrent transitions allowed for a partition will make it much >>>>>> easier. >>>>>> >>>>>> 2. When a participant restarts, I assume it will connect to ZK with a >>>>>> new session id. With DelayedAutoRebalancer, helix will not move >>>>>> replicas away from the participants, but it will promote some Slave >>>>>> replicas on other hosts to be the new Masters. Once the restarted host is >>>>>> back, will helix send "OFFLINE -> SLAVE" transition requests to it for >>>>>> all >>>>>> the partitions that were on this participant before the restart? >>>>>> >>>>>> 3. When the ZK session is expired on a participant (no restart), >>>>>> helix will behave the same, i.e., sending "OFFLINE->SLAVE" for all >>>>>> partitions to the participant once it reconnect to ZK, right? >>>>>> >>>>>> Thanks, >>>>>> Bo >>>>>> >>>>>> On Tue, Jan 23, 2018 at 10:39 AM, kishore g <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Relying on reuse of the same statemodel instance by Helix might make >>>>>>> it model too rigid and tied to current implementation in Helix. Let's >>>>>>> not >>>>>>> expose that to the clients. >>>>>>> >>>>>>> Helix internally carries over the previous partitions assignment >>>>>>> during startup but sets the state to initial state (OFFLINE in this >>>>>>> case) >>>>>>> by default. If the client really needs to know what was the previous >>>>>>> state, >>>>>>> we can provide a hook to the client to compute the initial state. In any >>>>>>> case, lets hear more from Bo before making any changes. >>>>>>> >>>>>>> On Tue, Jan 23, 2018 at 9:19 AM, Lei Xia <[email protected]> wrote: >>>>>>> >>>>>>>> Hi, Bo >>>>>>>> >>>>>>>> >>>>>>>> As Kishore commented, your offline->slave state transition >>>>>>>> callback needs some logic to determine whether a bootstrap or catch up >>>>>>>> is >>>>>>>> needed to transit a replica to slave. A common way is to persist the >>>>>>>> data >>>>>>>> version of a local partition somewhere, and during offline->slave, >>>>>>>> comparing local version (if there is) with current Master's version to >>>>>>>> determine if bootstrap (if version is null or too old) or catch-up is >>>>>>>> needed. >>>>>>>> >>>>>>>> >>>>>>>> There is one more difference in how Helix handles participant >>>>>>>> restarting vs ZK session. When a participant starts (or restarts), it >>>>>>>> creates a new StateModel (by calling CreateStateModel() in your >>>>>>>> StateModelFactory) for each partition. However, if a participant >>>>>>>> loses ZK >>>>>>>> session and comes back (with new session), it will reuse the >>>>>>>> StateModel for >>>>>>>> partitions that were there before instead of creating a new one. You >>>>>>>> may >>>>>>>> leverage this to tell whether a participant has been restarted or just >>>>>>>> re-established the ZK connection. >>>>>>>> >>>>>>>> >>>>>>>> In addition, the Delayed feature in DelayedAutoRebalancer is a >>>>>>>> little different then what you may understand. When you lose a >>>>>>>> participant >>>>>>>> (e.g, crashed, in maintenance), you lose one replica for some >>>>>>>> partitions. >>>>>>>> In this situation, Helix will usually bring up a new replica in some >>>>>>>> other >>>>>>>> live node immediately to maintain the required replica count. >>>>>>>> However, this may bring performance impact since bringing a new >>>>>>>> replica can >>>>>>>> require data bootstrap in new node. If you expect the original >>>>>>>> participant >>>>>>>> will be back online soon and also you can tolerate losing one or more >>>>>>>> replica in short-term, then you can set a delay time here. In which >>>>>>>> Helix >>>>>>>> will not bring a new replica before this time. Hope that makes it more >>>>>>>> clear. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Thanks >>>>>>>> >>>>>>>> Lei >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> *Lei Xia* >>>>>>>> >>>>>>>> >>>>>>>> Data Infra/Helix >>>>>>>> >>>>>>>> [email protected] >>>>>>>> www.linkedin.com/in/lxia1 >>>>>>>> ------------------------------ >>>>>>>> *From:* Bo Liu <[email protected]> >>>>>>>> *Sent:* Monday, January 22, 2018 11:12:48 PM >>>>>>>> *To:* [email protected] >>>>>>>> *Subject:* differentiate between bootstrap and a soft failure >>>>>>>> >>>>>>>> Hi There, >>>>>>>> >>>>>>>> I am using FULL_AUTO with MasterSlave and DelayedAutoRebalancer. >>>>>>>> How can a participant differentiate between these two cases: >>>>>>>> >>>>>>>> 1) when a participant first joins a cluster, it will be requested >>>>>>>> to transit from OFFLINE to SLAVE. Since the participant doesn't have >>>>>>>> any >>>>>>>> data for this partition, it needs to bootstrap and download data from >>>>>>>> another participant or a data source. >>>>>>>> 2) when a participant loses its ZK session, the controller will >>>>>>>> automatically change the participant to be OFFLINE in ZK. If the >>>>>>>> participant manages to establish a new session to ZK before the delayed >>>>>>>> time threshold, the controller will send a request to it to switch from >>>>>>>> OFFLINE to SLAVE. In this case, the participant already has the data >>>>>>>> for >>>>>>>> the partition, so it doesn't need to bootstrap from other data sources. >>>>>>>> >>>>>>>> -- >>>>>>>> Best regards, >>>>>>>> Bo >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Best regards, >>>>>> Bo >>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> Best regards, >>>> Bo >>>> >>>> >>> >>> >>> -- >>> Best regards, >>> Bo >>> >>> >> -- Best regards, Bo
