RE: State transitions of partitions and Distributed Cluster controllers

Subramanian Raghunathan Thu, 28 Jan 2016 18:53:19 -0800

Hi Helix Team,

@Kishore
This  definitely helps a lot. Thanks.


Yes, reset will be called when you lose zk session. It will also be invoked 
when a partition goes to ERROR state and you want to get back to OFFLINE state. 
( I am not 100% sure if reset api is invoked or ERROR to OFFLINE transition is 
invoked). Jason might be able to answer that.

The question of mine was the event life cycle.
For E.G : When session is lost say reset() method is invoked
                  On further session reconnection based on the state model will 
the corresponding handler method be notified?

Thanks & Regards,
Subramanian Raghunathan.

From: kishore g [mailto:[email protected]]
Sent: Wednesday, January 27, 2016 11:31 AM
To: [email protected]
Cc: [email protected]; [email protected]
Subject: Re: State transitions of partitions and Distributed Cluster controllers

thanks for sending it again.

I looked at the code, even though the retry is handled on the participant. 
Looks like we are not setting it for state transition message. We do have this 
ability to set it for custom message type.

Fix is easy, we just need to set message.setRetryCount in this class

https://github.com/apache/helix/blob/9e51cb7bdf8424df46c6fa353e7c80d984c21193/helix-core/src/main/java/org/apache/helix/controller/stages/MessageGenerationStage.java

We can read the retry count from cluster config.

There was another email I had recently sent with instructions to set up 
distributed controller. In short the steps are

helixadmin create-cluster super_cluster
helixadmin addInstance super_cluster  controller1
helixadmin addInstance super_cluster  controller2
helixadmin addInstance super_cluster  controller3

start the three controller in distributed mode and provide super_cluster as the 
cluster name.

Now any time you create a cluster, you can add that cluster as a resource in 
the super_cluster. One of the controllers will automatically start managing the 
new cluster. For e.g.
helixadmin create-cluster cluster1
helixadmin addresource super-cluster cluster1 AUTO mode leaderstandbymodel

I don't remember the exact commands on top of my head but it should look 
something like that.

Yes, reset will be called when you lose zk session. It will also be invoked 
when a partition goes to ERROR state and you want to get back to OFFLINE state. 
( I am not 100% sure if reset api is invoked or ERROR to OFFLINE transition is 
invoked). Jason might be able to answer that.

Hope that helps.


On Wed, Jan 27, 2016 at 10:51 AM, Subramanian Raghunathan 
<[email protected]<mailto:[email protected]>>
 wrote:
Hi Helix Team ,

                I am evaluating helix as a cluster management framework. I 
believe it’s very modular , highly customizable with a variety of out of box 
capabilities. Kudos to the team !

I have the below queries :


1)      How to configure the number of retries  in state transition handlers ?

http://markmail.org/message/vgc4nksocolqiqx5
                I referred to the this particular mail conversion : “you can 
configure the number of retries before a transition is considered as failed”


2)       Please point me to an example/interfaces of starting a distributed 
cluster controller and how to add the various clusters that the controllers is 
supposed to manage.


3)      What would be the event life cycle of the reset() method in 
TransitionHandler

a.       Believe this gets called if zookeeper client session is lost or 
there’s an update to the cluster configuration

Note: I am using the “helix-0.7.1” version.

Thanks & Regards,
Subramanian Raghunathan

RE: State transitions of partitions and Distributed Cluster controllers

Reply via email to