callback for controller becoming master

2018-01-26 Thread Dave Peterson
Does the Helix API provide a way to register a callback that gets called
when a Helix controller becomes master?

Thanks,
Dave


Re: differentiate between bootstrap and a soft failure

2018-01-26 Thread Bo Liu
After applying those changes, I am still observing the same behavior.


On Fri, Jan 26, 2018 at 11:50 AM, Bo Liu  wrote:

> Thanks Lei, will try it out.
> Yes, a tutorial page for this new feature would be very helpful.
>
> On Jan 26, 2018 09:49, "Lei Xia"  wrote:
>
>> Hi, Bo
>>
>>That is not the expected behavior.   Would you add (or replace) the
>> following configs into your idealstate?   The ""MIN_ACTIVE_REPLICAS" tells
>> Helix the minimal replica it should maintain, for example, if your total
>> replica count is 3 and you lose 2 instances, Helix will bring at least 1
>> more replica online immediately irregarding of delayed setting to meet the
>> minimal replica requirement.
>>
>> ,"REBALANCE_STRATEGY":"org.apache.helix.controller.rebalance
>> r.strategy.CrushRebalanceStrategy"
>> , "MIN_ACTIVE_REPLICAS":"2"
>> ,"REBALANCER_CLASS_NAME":"org.apache.helix.controller.rebala
>> ncer.DelayedAutoRebalancer"
>>
>> Also please add the following two configs into your ClusterConfig,
>> specially DELAY_REBALANCE_TIME specifies how long Helix should delay to
>> bring new replica, e.g, if an instance is down and does not come back after
>> 60ms, Helix will move all replica on that instance to other live
>> instances.
>>
>> "DELAY_REBALANCE_ENABLED" : "true",
>> "DELAY_REBALANCE_TIME" : "60",
>>
>>
>> Please have a try and let us know how it works.  And apologize to not
>> have an updated manual on our website, we are working on updating all of
>> our developer manuals for all latest new features, it will be out soon.
>>
>>
>> Thanks
>> Lei
>>
>> On Thu, Jan 25, 2018 at 6:17 PM, Bo Liu  wrote:
>>
>>> I tried to run with DelayedAutoRebalancer. When a participant host
>>> (localhost_12913) was killed, shards hosted on it were not moved, which is
>>> expected.
>>> And the external view for the resource is like:
>>>
>>> ExternalView for test:
>>>
>>> {
>>>
>>>   "id" : "test",
>>>
>>>   "mapFields" : {
>>>
>>> "test_0" : {
>>>
>>>   "localhost_12914" : "ONLINE"
>>>
>>> },
>>>
>>> "test_1" : {
>>>
>>>   "localhost_12914" : "ONLINE",
>>>
>>>   "localhost_12915" : "ONLINE"
>>>
>>> },
>>>
>>> "test_2" : {
>>>
>>>   "localhost_12915" : "ONLINE"
>>>
>>> }
>>>
>>>   },
>>>
>>>   "listFields" : {
>>>
>>>   },
>>>
>>>   "simpleFields" : {
>>>
>>> "BUCKET_SIZE" : "0",
>>>
>>> "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>
>>> "NUM_PARTITIONS" : "3",
>>>
>>> "REBALANCER_CLASS_NAME" : "DelayedAutoRebalancer",
>>>
>>> "REBALANCE_MODE" : "FULL_AUTO",
>>>
>>> "REBALANCE_STRATEGY" : "AutoRebalanceStrategy",
>>>
>>> "REPLICAS" : "2",
>>>
>>> "STATE_MODEL_DEF_REF" : "OnlineOffline",
>>>
>>> "STATE_MODEL_FACTORY_NAME" : "DEFAULT"
>>>
>>>   }
>>>
>>> }
>>>
>>> However, when I restarted the participant, the participant didn't get
>>> any new transition requests and the external view became:
>>>
>>> ExternalView for test:
>>>
>>> {
>>>
>>>   "id" : "test",
>>>
>>>   "mapFields" : {
>>>
>>> "test_0" : {
>>>
>>>   "localhost_12913" : "OFFLINE",
>>>
>>>   "localhost_12914" : "ONLINE"
>>>
>>> },
>>>
>>> "test_1" : {
>>>
>>>   "localhost_12914" : "ONLINE",
>>>
>>>   "localhost_12915" : "ONLINE"
>>>
>>> },
>>>
>>> "test_2" : {
>>>
>>>   "localhost_12913" : "OFFLINE",
>>>
>>>   "localhost_12915" : "ONLINE"
>>>
>>> }
>>>
>>>   },
>>>
>>>   "listFields" : {
>>>
>>>   },
>>>
>>>   "simpleFields" : {
>>>
>>> "BUCKET_SIZE" : "0",
>>>
>>> "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>
>>> "NUM_PARTITIONS" : "3",
>>>
>>> "REBALANCER_CLASS_NAME" : "DelayedAutoRebalancer",
>>>
>>> "REBALANCE_MODE" : "FULL_AUTO",
>>>
>>> "REBALANCE_STRATEGY" : "AutoRebalanceStrategy",
>>>
>>> "REPLICAS" : "2",
>>>
>>> "STATE_MODEL_DEF_REF" : "OnlineOffline",
>>>
>>> "STATE_MODEL_FACTORY_NAME" : "DEFAULT"
>>>
>>>   }
>>>
>>> }
>>>
>>> I am wondering if this is the expected behavior?
>>>
>>>
>>>
>>> On Tue, Jan 23, 2018 at 2:38 PM, Bo Liu  wrote:
>>>
 Great, thank you for the prompt reply.

 Thanks,
 Bo

 On Tue, Jan 23, 2018 at 1:47 PM, kishore g  wrote:

>
>1. Yes, you can set the max transitions constraint on per
>partition, per instance, per resource scope. There is a helix admin 
> API to
>set the constraint. I dont have it handy.
>2.  Yes, Helix will send OFFLINE->SLAVE transitions to all
>partitions that were on the host and still present in the idealstate. 
> If
>its removed from Idealstate, it will send OFFLINE->DROPPED transition.
>3. Right. Expiry is same as a restart. The only difference is
>with expiry, it calls reset method on the statemodel where one can 
> plugin
>custom behavior.
>
>
>
> On Tue, Jan 23, 2018 at 11:57 AM, Bo Liu 

Re: differentiate between bootstrap and a soft failure

2018-01-26 Thread Bo Liu
Thanks Lei, will try it out.
Yes, a tutorial page for this new feature would be very helpful.

On Jan 26, 2018 09:49, "Lei Xia"  wrote:

> Hi, Bo
>
>That is not the expected behavior.   Would you add (or replace) the
> following configs into your idealstate?   The ""MIN_ACTIVE_REPLICAS" tells
> Helix the minimal replica it should maintain, for example, if your total
> replica count is 3 and you lose 2 instances, Helix will bring at least 1
> more replica online immediately irregarding of delayed setting to meet the
> minimal replica requirement.
>
> ,"REBALANCE_STRATEGY":"org.apache.helix.controller.
> rebalancer.strategy.CrushRebalanceStrategy"
> , "MIN_ACTIVE_REPLICAS":"2"
> ,"REBALANCER_CLASS_NAME":"org.apache.helix.controller.rebalancer.
> DelayedAutoRebalancer"
>
> Also please add the following two configs into your ClusterConfig,
> specially DELAY_REBALANCE_TIME specifies how long Helix should delay to
> bring new replica, e.g, if an instance is down and does not come back after
> 60ms, Helix will move all replica on that instance to other live
> instances.
>
> "DELAY_REBALANCE_ENABLED" : "true",
> "DELAY_REBALANCE_TIME" : "60",
>
>
> Please have a try and let us know how it works.  And apologize to not have
> an updated manual on our website, we are working on updating all of our
> developer manuals for all latest new features, it will be out soon.
>
>
> Thanks
> Lei
>
> On Thu, Jan 25, 2018 at 6:17 PM, Bo Liu  wrote:
>
>> I tried to run with DelayedAutoRebalancer. When a participant host
>> (localhost_12913) was killed, shards hosted on it were not moved, which is
>> expected.
>> And the external view for the resource is like:
>>
>> ExternalView for test:
>>
>> {
>>
>>   "id" : "test",
>>
>>   "mapFields" : {
>>
>> "test_0" : {
>>
>>   "localhost_12914" : "ONLINE"
>>
>> },
>>
>> "test_1" : {
>>
>>   "localhost_12914" : "ONLINE",
>>
>>   "localhost_12915" : "ONLINE"
>>
>> },
>>
>> "test_2" : {
>>
>>   "localhost_12915" : "ONLINE"
>>
>> }
>>
>>   },
>>
>>   "listFields" : {
>>
>>   },
>>
>>   "simpleFields" : {
>>
>> "BUCKET_SIZE" : "0",
>>
>> "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>
>> "NUM_PARTITIONS" : "3",
>>
>> "REBALANCER_CLASS_NAME" : "DelayedAutoRebalancer",
>>
>> "REBALANCE_MODE" : "FULL_AUTO",
>>
>> "REBALANCE_STRATEGY" : "AutoRebalanceStrategy",
>>
>> "REPLICAS" : "2",
>>
>> "STATE_MODEL_DEF_REF" : "OnlineOffline",
>>
>> "STATE_MODEL_FACTORY_NAME" : "DEFAULT"
>>
>>   }
>>
>> }
>>
>> However, when I restarted the participant, the participant didn't get any
>> new transition requests and the external view became:
>>
>> ExternalView for test:
>>
>> {
>>
>>   "id" : "test",
>>
>>   "mapFields" : {
>>
>> "test_0" : {
>>
>>   "localhost_12913" : "OFFLINE",
>>
>>   "localhost_12914" : "ONLINE"
>>
>> },
>>
>> "test_1" : {
>>
>>   "localhost_12914" : "ONLINE",
>>
>>   "localhost_12915" : "ONLINE"
>>
>> },
>>
>> "test_2" : {
>>
>>   "localhost_12913" : "OFFLINE",
>>
>>   "localhost_12915" : "ONLINE"
>>
>> }
>>
>>   },
>>
>>   "listFields" : {
>>
>>   },
>>
>>   "simpleFields" : {
>>
>> "BUCKET_SIZE" : "0",
>>
>> "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>
>> "NUM_PARTITIONS" : "3",
>>
>> "REBALANCER_CLASS_NAME" : "DelayedAutoRebalancer",
>>
>> "REBALANCE_MODE" : "FULL_AUTO",
>>
>> "REBALANCE_STRATEGY" : "AutoRebalanceStrategy",
>>
>> "REPLICAS" : "2",
>>
>> "STATE_MODEL_DEF_REF" : "OnlineOffline",
>>
>> "STATE_MODEL_FACTORY_NAME" : "DEFAULT"
>>
>>   }
>>
>> }
>>
>> I am wondering if this is the expected behavior?
>>
>>
>>
>> On Tue, Jan 23, 2018 at 2:38 PM, Bo Liu  wrote:
>>
>>> Great, thank you for the prompt reply.
>>>
>>> Thanks,
>>> Bo
>>>
>>> On Tue, Jan 23, 2018 at 1:47 PM, kishore g  wrote:
>>>

1. Yes, you can set the max transitions constraint on per
partition, per instance, per resource scope. There is a helix admin API 
 to
set the constraint. I dont have it handy.
2.  Yes, Helix will send OFFLINE->SLAVE transitions to all
partitions that were on the host and still present in the idealstate. If
its removed from Idealstate, it will send OFFLINE->DROPPED transition.
3. Right. Expiry is same as a restart. The only difference is
with expiry, it calls reset method on the statemodel where one can 
 plugin
custom behavior.



 On Tue, Jan 23, 2018 at 11:57 AM, Bo Liu  wrote:

> Thanks Kishore & Lei!
>
> It's a good point to rely on the data in a local partition to decide
> if a bootstrap is needed or catching up is good enough.'
>
> A few more questions.
>
> 1. is there a way to allow at most one transition for a partition at a
> time? During a state transition, a