callback for controller becoming master
Does the Helix API provide a way to register a callback that gets called when a Helix controller becomes master? Thanks, Dave
Re: differentiate between bootstrap and a soft failure
After applying those changes, I am still observing the same behavior. On Fri, Jan 26, 2018 at 11:50 AM, Bo Liuwrote: > Thanks Lei, will try it out. > Yes, a tutorial page for this new feature would be very helpful. > > On Jan 26, 2018 09:49, "Lei Xia" wrote: > >> Hi, Bo >> >>That is not the expected behavior. Would you add (or replace) the >> following configs into your idealstate? The ""MIN_ACTIVE_REPLICAS" tells >> Helix the minimal replica it should maintain, for example, if your total >> replica count is 3 and you lose 2 instances, Helix will bring at least 1 >> more replica online immediately irregarding of delayed setting to meet the >> minimal replica requirement. >> >> ,"REBALANCE_STRATEGY":"org.apache.helix.controller.rebalance >> r.strategy.CrushRebalanceStrategy" >> , "MIN_ACTIVE_REPLICAS":"2" >> ,"REBALANCER_CLASS_NAME":"org.apache.helix.controller.rebala >> ncer.DelayedAutoRebalancer" >> >> Also please add the following two configs into your ClusterConfig, >> specially DELAY_REBALANCE_TIME specifies how long Helix should delay to >> bring new replica, e.g, if an instance is down and does not come back after >> 60ms, Helix will move all replica on that instance to other live >> instances. >> >> "DELAY_REBALANCE_ENABLED" : "true", >> "DELAY_REBALANCE_TIME" : "60", >> >> >> Please have a try and let us know how it works. And apologize to not >> have an updated manual on our website, we are working on updating all of >> our developer manuals for all latest new features, it will be out soon. >> >> >> Thanks >> Lei >> >> On Thu, Jan 25, 2018 at 6:17 PM, Bo Liu wrote: >> >>> I tried to run with DelayedAutoRebalancer. When a participant host >>> (localhost_12913) was killed, shards hosted on it were not moved, which is >>> expected. >>> And the external view for the resource is like: >>> >>> ExternalView for test: >>> >>> { >>> >>> "id" : "test", >>> >>> "mapFields" : { >>> >>> "test_0" : { >>> >>> "localhost_12914" : "ONLINE" >>> >>> }, >>> >>> "test_1" : { >>> >>> "localhost_12914" : "ONLINE", >>> >>> "localhost_12915" : "ONLINE" >>> >>> }, >>> >>> "test_2" : { >>> >>> "localhost_12915" : "ONLINE" >>> >>> } >>> >>> }, >>> >>> "listFields" : { >>> >>> }, >>> >>> "simpleFields" : { >>> >>> "BUCKET_SIZE" : "0", >>> >>> "IDEAL_STATE_MODE" : "AUTO_REBALANCE", >>> >>> "NUM_PARTITIONS" : "3", >>> >>> "REBALANCER_CLASS_NAME" : "DelayedAutoRebalancer", >>> >>> "REBALANCE_MODE" : "FULL_AUTO", >>> >>> "REBALANCE_STRATEGY" : "AutoRebalanceStrategy", >>> >>> "REPLICAS" : "2", >>> >>> "STATE_MODEL_DEF_REF" : "OnlineOffline", >>> >>> "STATE_MODEL_FACTORY_NAME" : "DEFAULT" >>> >>> } >>> >>> } >>> >>> However, when I restarted the participant, the participant didn't get >>> any new transition requests and the external view became: >>> >>> ExternalView for test: >>> >>> { >>> >>> "id" : "test", >>> >>> "mapFields" : { >>> >>> "test_0" : { >>> >>> "localhost_12913" : "OFFLINE", >>> >>> "localhost_12914" : "ONLINE" >>> >>> }, >>> >>> "test_1" : { >>> >>> "localhost_12914" : "ONLINE", >>> >>> "localhost_12915" : "ONLINE" >>> >>> }, >>> >>> "test_2" : { >>> >>> "localhost_12913" : "OFFLINE", >>> >>> "localhost_12915" : "ONLINE" >>> >>> } >>> >>> }, >>> >>> "listFields" : { >>> >>> }, >>> >>> "simpleFields" : { >>> >>> "BUCKET_SIZE" : "0", >>> >>> "IDEAL_STATE_MODE" : "AUTO_REBALANCE", >>> >>> "NUM_PARTITIONS" : "3", >>> >>> "REBALANCER_CLASS_NAME" : "DelayedAutoRebalancer", >>> >>> "REBALANCE_MODE" : "FULL_AUTO", >>> >>> "REBALANCE_STRATEGY" : "AutoRebalanceStrategy", >>> >>> "REPLICAS" : "2", >>> >>> "STATE_MODEL_DEF_REF" : "OnlineOffline", >>> >>> "STATE_MODEL_FACTORY_NAME" : "DEFAULT" >>> >>> } >>> >>> } >>> >>> I am wondering if this is the expected behavior? >>> >>> >>> >>> On Tue, Jan 23, 2018 at 2:38 PM, Bo Liu wrote: >>> Great, thank you for the prompt reply. Thanks, Bo On Tue, Jan 23, 2018 at 1:47 PM, kishore g wrote: > >1. Yes, you can set the max transitions constraint on per >partition, per instance, per resource scope. There is a helix admin > API to >set the constraint. I dont have it handy. >2. Yes, Helix will send OFFLINE->SLAVE transitions to all >partitions that were on the host and still present in the idealstate. > If >its removed from Idealstate, it will send OFFLINE->DROPPED transition. >3. Right. Expiry is same as a restart. The only difference is >with expiry, it calls reset method on the statemodel where one can > plugin >custom behavior. > > > > On Tue, Jan 23, 2018 at 11:57 AM, Bo Liu
Re: differentiate between bootstrap and a soft failure
Thanks Lei, will try it out. Yes, a tutorial page for this new feature would be very helpful. On Jan 26, 2018 09:49, "Lei Xia"wrote: > Hi, Bo > >That is not the expected behavior. Would you add (or replace) the > following configs into your idealstate? The ""MIN_ACTIVE_REPLICAS" tells > Helix the minimal replica it should maintain, for example, if your total > replica count is 3 and you lose 2 instances, Helix will bring at least 1 > more replica online immediately irregarding of delayed setting to meet the > minimal replica requirement. > > ,"REBALANCE_STRATEGY":"org.apache.helix.controller. > rebalancer.strategy.CrushRebalanceStrategy" > , "MIN_ACTIVE_REPLICAS":"2" > ,"REBALANCER_CLASS_NAME":"org.apache.helix.controller.rebalancer. > DelayedAutoRebalancer" > > Also please add the following two configs into your ClusterConfig, > specially DELAY_REBALANCE_TIME specifies how long Helix should delay to > bring new replica, e.g, if an instance is down and does not come back after > 60ms, Helix will move all replica on that instance to other live > instances. > > "DELAY_REBALANCE_ENABLED" : "true", > "DELAY_REBALANCE_TIME" : "60", > > > Please have a try and let us know how it works. And apologize to not have > an updated manual on our website, we are working on updating all of our > developer manuals for all latest new features, it will be out soon. > > > Thanks > Lei > > On Thu, Jan 25, 2018 at 6:17 PM, Bo Liu wrote: > >> I tried to run with DelayedAutoRebalancer. When a participant host >> (localhost_12913) was killed, shards hosted on it were not moved, which is >> expected. >> And the external view for the resource is like: >> >> ExternalView for test: >> >> { >> >> "id" : "test", >> >> "mapFields" : { >> >> "test_0" : { >> >> "localhost_12914" : "ONLINE" >> >> }, >> >> "test_1" : { >> >> "localhost_12914" : "ONLINE", >> >> "localhost_12915" : "ONLINE" >> >> }, >> >> "test_2" : { >> >> "localhost_12915" : "ONLINE" >> >> } >> >> }, >> >> "listFields" : { >> >> }, >> >> "simpleFields" : { >> >> "BUCKET_SIZE" : "0", >> >> "IDEAL_STATE_MODE" : "AUTO_REBALANCE", >> >> "NUM_PARTITIONS" : "3", >> >> "REBALANCER_CLASS_NAME" : "DelayedAutoRebalancer", >> >> "REBALANCE_MODE" : "FULL_AUTO", >> >> "REBALANCE_STRATEGY" : "AutoRebalanceStrategy", >> >> "REPLICAS" : "2", >> >> "STATE_MODEL_DEF_REF" : "OnlineOffline", >> >> "STATE_MODEL_FACTORY_NAME" : "DEFAULT" >> >> } >> >> } >> >> However, when I restarted the participant, the participant didn't get any >> new transition requests and the external view became: >> >> ExternalView for test: >> >> { >> >> "id" : "test", >> >> "mapFields" : { >> >> "test_0" : { >> >> "localhost_12913" : "OFFLINE", >> >> "localhost_12914" : "ONLINE" >> >> }, >> >> "test_1" : { >> >> "localhost_12914" : "ONLINE", >> >> "localhost_12915" : "ONLINE" >> >> }, >> >> "test_2" : { >> >> "localhost_12913" : "OFFLINE", >> >> "localhost_12915" : "ONLINE" >> >> } >> >> }, >> >> "listFields" : { >> >> }, >> >> "simpleFields" : { >> >> "BUCKET_SIZE" : "0", >> >> "IDEAL_STATE_MODE" : "AUTO_REBALANCE", >> >> "NUM_PARTITIONS" : "3", >> >> "REBALANCER_CLASS_NAME" : "DelayedAutoRebalancer", >> >> "REBALANCE_MODE" : "FULL_AUTO", >> >> "REBALANCE_STRATEGY" : "AutoRebalanceStrategy", >> >> "REPLICAS" : "2", >> >> "STATE_MODEL_DEF_REF" : "OnlineOffline", >> >> "STATE_MODEL_FACTORY_NAME" : "DEFAULT" >> >> } >> >> } >> >> I am wondering if this is the expected behavior? >> >> >> >> On Tue, Jan 23, 2018 at 2:38 PM, Bo Liu wrote: >> >>> Great, thank you for the prompt reply. >>> >>> Thanks, >>> Bo >>> >>> On Tue, Jan 23, 2018 at 1:47 PM, kishore g wrote: >>> 1. Yes, you can set the max transitions constraint on per partition, per instance, per resource scope. There is a helix admin API to set the constraint. I dont have it handy. 2. Yes, Helix will send OFFLINE->SLAVE transitions to all partitions that were on the host and still present in the idealstate. If its removed from Idealstate, it will send OFFLINE->DROPPED transition. 3. Right. Expiry is same as a restart. The only difference is with expiry, it calls reset method on the statemodel where one can plugin custom behavior. On Tue, Jan 23, 2018 at 11:57 AM, Bo Liu wrote: > Thanks Kishore & Lei! > > It's a good point to rely on the data in a local partition to decide > if a bootstrap is needed or catching up is good enough.' > > A few more questions. > > 1. is there a way to allow at most one transition for a partition at a > time? During a state transition, a