I see. ControllerListener seems interesting. Would I get notifications of INIT and FINALIZE types whenever a controller becomes leader/loses its leadership. That way I can easily start my rebalancer thread in the INIT call and shut it down in the FINALIZE call.
Thanks Varun On Wed, Aug 6, 2014 at 4:07 PM, Kanak Biscuitwala <[email protected]> wrote: > Yeah, it's not possible to attach to a controller because the custom code > runner essentially treats each runner as a participant for a logical > resource corresponding to leadership of running that code. > > Rather than using HelixCustomCodeRunner, perhaps you can just use a basic > scheduled timer task on each controller, and when it is triggered, check if > you're the leader, and if so, run the ideal state update code. Another > thing you can do is helixManager#addControllerListener(), and on the > callback check if you're leader, and schedule the timer if you are, and > otherwise cancel the timer. > > Here's how you can check if you're leader: > > HelixDataAccessor accessor = helixManager.getHelixDataAccessor(); > LiveInstance leader = > accessor.getProperty(accessor.keyBuilder().controllerLeader()); > if (leader != null && leader.getId().equals(myId)) { > // I am the leader > } > > ------------------------------ > Date: Wed, 6 Aug 2014 15:56:35 -0700 > > Subject: Re: Questions about custom helix rebalancer/controller/agent > From: [email protected] > To: [email protected] > > I am attempting to attach the HelixCustomCodeRunner to a controller > instance - not really running a controller alongside each of my nodes. > HelixCustomCodeRunner.start() is failing as above with a nullpointer > exception at line 120. Is it not possible to attach the > HelixCustomCodeRunner to a controller instance ? > > Thanks ! > Varun > > > On Wed, Aug 6, 2014 at 3:46 PM, Kanak Biscuitwala <[email protected]> > wrote: > > I would suggest maintaining 2 HelixManager connections: one for > CONTROLLER, one for PARTICIPANT (I'm assuming you're running a controller > instance alongside each of your nodes). It's wasteful, but you should just > leave the controller one alone, and then attach the state model factory and > custom code runner to the participant one. > > Kanak > > ------------------------------ > Date: Wed, 6 Aug 2014 15:43:16 -0700 > > Subject: Re: Questions about custom helix rebalancer/controller/agent > From: [email protected] > To: [email protected] > > Without this I was getting a null pointer exception in the > CustomCodeRunner - Helix 0.6.3 - Lines 120 and 121 > > > > > 120 > <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.helix/helix-core/0.6.3/org/apache/helix/participant/HelixCustomCodeRunner.java/#120> > > > <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.helix/helix-core/0.6.3/org/apache/helix/participant/HelixCustomCodeRunner.java/#> > > StateMachineEngine > <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.helix/helix-core/0.6.3/org/apache/helix/participant/StateMachineEngine.java#StateMachineEngine> > stateMach = _manager > <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.helix/helix-core/0.6.3/org/apache/helix/participant/HelixCustomCodeRunner.java#HelixCustomCodeRunner.0_manager>.getStateMachineEngine > > <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.helix/helix-core/0.6.3/org/apache/helix/HelixManager.java#HelixManager.getStateMachineEngine%28%29>(); > > 121 > <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.helix/helix-core/0.6.3/org/apache/helix/participant/HelixCustomCodeRunner.java/#121> > > > <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.helix/helix-core/0.6.3/org/apache/helix/participant/HelixCustomCodeRunner.java/#> > > stateMach.registerStateModelFactory > <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.helix/helix-core/0.6.3/org/apache/helix/participant/StateMachineEngine.java#StateMachineEngine.registerStateModelFactory%28java.lang.String%2corg.apache.helix.participant.statemachine.StateModelFactory%2cjava.lang.String%29>(LEADER_STANDBY > > <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.helix/helix-core/0.6.3/org/apache/helix/participant/HelixCustomCodeRunner.java#HelixCustomCodeRunner.0LEADER_STANDBY>, > _stateModelFty > <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.helix/helix-core/0.6.3/org/apache/helix/participant/HelixCustomCodeRunner.java#HelixCustomCodeRunner.0_stateModelFty>, > _resourceName > <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.helix/helix-core/0.6.3/org/apache/helix/participant/HelixCustomCodeRunner.java#HelixCustomCodeRunner.0_resourceName>); > > > My code calls the HelixCustomCodeRunner in the following way: > > > new HelixCustomCodeRunner(helixManager, zookeeperQuorum). > on(HelixConstants.ChangeType.LIVE_INSTANCE).invoke(myCallback). > usingLeaderStandbyModel("HDFS_rebalancer").start(); > > > > > On Wed, Aug 6, 2014 at 3:08 PM, Kanak Biscuitwala <[email protected]> > wrote: > > Hi Varun, > > getStateMachineEngine is only supported for InstanceType.PARTICIPANT. May > I ask why you need your controller to have state transition callbacks? > > In future releases, we're creating separate classes for each role, so > hopefully that will resolve confusions like this moving forward. > > Kanak > > ------------------------------ > Date: Wed, 6 Aug 2014 15:04:36 -0700 > > Subject: Re: Questions about custom helix rebalancer/controller/agent > From: [email protected] > To: [email protected] > > I am getting a weird null pointer exception while instantiating the > controller. Here is the error: > > this.helixManager = > HelixManagerFactory.getZKHelixManager(this.clusterName, > InetAddress.getLocalHost().getHostName() + ":" + thriftPort, > InstanceType.CONTROLLER, > zookeeperQuorum); > StateMachineEngine machineEngine = helixManager.getStateMachineEngine(); > > *machineEngine.registerStateModelFactory("HDFS_state_machine",* > * new OnlineOfflineStateModelFactory(1000));* > this.helixManager.connect(); > > I get a NullPointerException at line #3 because getStateMachineEngine() > returns a null value. Is that supposed to happen ? > > Thanks > Varun > > > On Fri, Aug 1, 2014 at 11:23 AM, Zhen Zhang <[email protected]> wrote: > > Hi Varun, > > The state transitions will be independent. Helix controller may send > MASTER->OFFLINE to all three nodes, for example, and if node1 completes the > MASTER->OFFLINE transition first, controller will send OFFLINE->DROPPED to > node1 first. Or if all three nodes completes MASTER->OFFLINE at the same > time, controller may send OFFLINE->DROPPED to all three nodes together. > > Thanks, > Jason > > From: Varun Sharma <[email protected]> > Reply-To: "[email protected]" <[email protected]> > Date: Friday, August 1, 2014 11:10 AM > To: "[email protected]" <[email protected]> > > Subject: Re: Questions about custom helix rebalancer/controller/agent > > Thanks a lot. Most of my questions are answered, except I have one > follow up question. > > Lets say I have a situation with 3 masters per partition. For partition > X, these are on node1, node2 and node3. Upon dropping the resource, would > the partition X be offlined on all three nodes and then dropped or can that > be independent as in, node1 offlines and drops, followed by node2 and so > on. Just want to check if we first wait for all the masters to offline and > then initiate the offline->drop or the other way round. > > Thanks ! > Varun > > > On Fri, Aug 1, 2014 at 10:33 AM, Kanak Biscuitwala <[email protected]> > wrote: > > > Dropping a resource will cause the controller to first send MASTER --> > OFFLINE for all partitions, and then OFFLINE --> DROPPED. > > Kanak > ------------------------------ > Date: Fri, 1 Aug 2014 10:30:54 -0700 > > Subject: Re: Questions about custom helix rebalancer/controller/agent > From: [email protected] > To: [email protected] > > In my case, I will have many resources - like say upto a 100 resources. > Each of them will have partitions in the range of 100-5K. So I guess, I do > require the bucket size. 300K partitions is the sum of partitions across > all resources, rather than the # of partitions within a single resource. > > Another question, I had was regarding removing a resource in Helix. When > a removeResource is called from HelixAdmin, would it trigger the > MASTER->OFFLINE the respective partitions before the resource is removing ? > To concretize my use case, we have many resources with a few thousand > partitions being loaded every day. New versions of the resources keep > getting loaded as brand new resources into Helix and the older versions are > decommissioned/garbage collected. So we would be issuing upto a 100 or so > resource additions per day and upto a 100 or so resource deletions every > day. Just want to check that deleting a resource would also trigger the > appropriate MASTER->OFFLINE transitions. > > Thanks > Varun > > > On Fri, Aug 1, 2014 at 10:18 AM, Kanak Biscuitwala <[email protected]> > wrote: > > a) By default, there is one znode per resource, which as you know is a > grouping of partitions. The biggest limitation is that ZK has a 1MB limit > on znode sizing. To get around this, Helix has the concept of bucketizing, > where in your ideal state, you can set a bucket size, which will > effectively create that many znodes to fully represent all your partitions. > I believe that you can have ~2k partitions before you start needing to > bucketize. > > 300k may cause you separate issues, and you may want to consider doing > things like enabling batch message mode in your ideal state so that each > message we send to an instance contains transitions for all partitions > hosted on that instance, rather than creating a znode per partition state > change. However, in theory (we've never played with this many in practice), > Helix should be able to function correctly with that many partitions. > > b) Yes, if you have a hard limit of 1 master per partition, Helix will > transition the first node to OFFLINE before sending the MASTER transition > to the new master. > > Kanak > > ------------------------------ > Date: Fri, 1 Aug 2014 10:09:24 -0700 > > Subject: Re: Questions about custom helix rebalancer/controller/agent > From: [email protected] > To: [email protected] > > Sounds fine to me. I can work without the FINALIZE notification for now, > but I hope its going to come out soon. A few more questions: > > a) How well does Helix scale with partitions - is each partition a > separate znode inside helix ? If I have 300K partitions in Helix would that > be an issue ? > b) If a partition which was assigned as a master on node1 is now assigned > as a master on node2, will node1 get a callback execution for transition > from MASTER-->OFFLINE > > Thanks > Varun > > > On Thu, Jul 31, 2014 at 11:18 PM, Kanak Biscuitwala <[email protected]> > wrote: > > s/run/start/g -- sorry about that, fixed in javadocs for future releases > > You may need to register for a notification type; I believe > HelixCustomCodeRunner complains if you don't. However, you can simply > ignore that notification type, and just check for INIT and FINALIZE > notification types in your callback to to track whether or not you're the > leader. On INIT, you start your 30 minute timer, and on FINALIZE you stop > it. You may need to wait for us to make a 0.6.4 release (we will likely do > this soon) to get the FINALIZE notification. > > Here is an example of a custom code runner usage: > Registration: > https://github.com/kishoreg/fullmatix/blob/master/mysql-cluster/src/main/java/org/apache/fullmatix/mysql/MySQLAgent.java > Callback: > https://github.com/kishoreg/fullmatix/blob/master/mysql-cluster/src/main/java/org/apache/fullmatix/mysql/MasterSlaveRebalancer.java > > Regarding setting up the Helix controller, you actually don't need to > instantiate a GenericHelixController. If you create a HelixManager with > InstanceType.CONTROLLER, then ZKHelixManager automatically creates a > GenericHelixController and sets it up with leader election. We really > should update the documentation to clarify that. > > ------------------------------ > Date: Thu, 31 Jul 2014 23:00:13 -0700 > > Subject: Re: Questions about custom helix rebalancer/controller/agent > From: [email protected] > To: [email protected] > > Thanks for the suggestions.. > > Seems like the HelixCustomCodeRunner could do it. However, it seems like > the CustomCodeRunner only provides hooks for plugging into notifications. > The documentation example in the above link suggests a run() method, which > does not seem to exist. > > However, this maybe sufficient for my case. I essentially hook in an > empty CustomCodeRunner into my helix manager. Then I can instantiate my own > thread which would run above snippet and keep writing ideal states every 30 > minutes. I guess I would still need to attach the GenericHelixController > with the following code snippet to take action whenever the ideal state > changes ?? > > GenericHelixController controller = new GenericHelixController(); > manager.addConfigChangeListener(controller); > manager.addLiveInstanceChangeListener(controller); > manager.addIdealStateChangeListener(controller); > manager.addExternalViewChangeListener(controller); > manager.addControllerListener(controller); > > > > > > On Thu, Jul 31, 2014 at 6:01 PM, kishore g <[email protected]> wrote: > > List resourceList = helixAdmin.getResourceList(); > for each resource: > Compute target ideal state > helixAdmin.setIdealState(resource, targetIdealState); > > Thread.sleep(30minutes); > > This can work right. This code can be as part of CustomCodeRunner. > http://helix.apache.org/javadocs/0.6.3/reference/org/apache/helix/participant/HelixCustomCodeRunner.html. > You can say you are interested in notifications but can ignore that. > > thanks, > Kishore G > > > On Thu, Jul 31, 2014 at 5:45 PM, Kanak Biscuitwala <[email protected]> > wrote: > > i.e. helixAdmin.enableCluster(clusterName, false); > > ------------------------------ > From: [email protected] > To: [email protected] > Subject: RE: Questions about custom helix rebalancer/controller/agent > Date: Thu, 31 Jul 2014 17:44:40 -0700 > > > Unfortunately HelixAdmin#rebalance is a misnomer, and it is a function of > all the configured instances and not the live instances. The closest you > can get to that is to use the third option I listed related to CUSTOMIZED > mode, where you write the mappings yourself based on what is live. > > Another thing you could do is pause the cluster controller and unpause > it for a period every 30 minutes. That will essentially enforce that the > controller will not send transitions (or do anything else, really) during > the time it is paused. This sounds a little like a hack to me, but it may > do what you want. > > Kanak > > ------------------------------ > Date: Thu, 31 Jul 2014 17:39:40 -0700 > Subject: Re: Questions about custom helix rebalancer/controller/agent > From: [email protected] > To: [email protected] > > Thanks Kanak, for your detailed response and this is really very helpful. > I was wondering if its possible for me do something like the following: > > List resourceList = helixAdmin.getResourceList(); > for each resource: > Compute target ideal state > helixAdmin.rebalance(resource); > > Thread.sleep(30minutes); > > So, the above happens inside a while loop thread and this is the only > place where we do the rebalancing ? > > Thanks > Varun > > > On Thu, Jul 31, 2014 at 5:25 PM, Kanak Biscuitwala <[email protected]> > wrote: > > Hi Varun, > > Sorry for the delay. > > 1 and 3) There are a number of ways to do this, with various tradeoffs. > > - You can write a user-defined rebalancer. In helix 0.6.x, it involves > implementing the following interface: > > > https://github.com/apache/helix/blob/helix-0.6.x/helix-core/src/main/java/org/apache/helix/controller/rebalancer/Rebalancer.java > > Essentially what it does is given an existing ideal state, compute a new > ideal state. For 0.6.x, this will read the preference lists in the output > ideal state and compute a state mapping based on them. If you need more > control, you can also implement: > > > https://github.com/apache/helix/blob/helix-0.6.x/helix-core/src/main/java/org/apache/helix/controller/rebalancer/internal/MappingCalculator.java > > which will allow you to create a mapping from partition to map of > participant and state. In 0.7.x, we consolidated these into a single method. > > Here is a tutorial on the user-defined rebalancer: > http://helix.apache.org/0.6.3-docs/tutorial_user_def_rebalancer.html > > Now, running this every 30 minutes is tricky because by default the > controller responds to all cluster events (and really it needs to because > it aggregates all participant current states into the external view -- > unless you don't care about that). > > - Combined with the user-defined rebalancer (or not), you can have a > GenericHelixController that doesn't listen on any events, but calls > startRebalancingTimer(), into which you can pass 30 minutes. The problem > with this is that the instructions at > http://helix.apache.org/0.6.3-docs/tutorial_controller.html won't work as > described because of a known issue. The workaround is to connect > HelixManager as role ADMINISTRATOR instead of CONTROLLER. > > However, if you connect as ADMINISTRATOR, you have to set up leader > election yourself (assuming you want a fault-tolerant controller). See > https://github.com/apache/helix/blob/helix-0.6.x/helix-core/src/main/java/org/apache/helix/manager/zk/DistributedLeaderElection.java > for > a controller change listener that can do leader election, but your version > will have to be different, as you actually don't want to add listeners, but > rather set up a timer. > > This also gives you the benefit of plugging in your own logic into the > controller pipeline. See > https://github.com/apache/helix/blob/helix-0.6.x/helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java > createDefaultRegistry() > for how to create an appropriate PipelineRegistry. > > - You can take a completely different approach and put your ideal state > in CUSTOMIZED rebalance mode. Then you can have a meta-resource where one > participant is a leader and the others are followers (you can create an > ideal state in SEMI_AUTO mode, where the replica count and the replica > count and preference list of resourceName_0 is "ANY_LIVEINSTANCE". When one > participant is told to become leader, you can set a timer for 30 minutes > and update and write the map fields of the ideal state accordingly. > > 2) I'm not sure I understand the question. If you're in the JVM, you > simply need to connect as a PARTICIPANT for your callbacks, but that can > just be something you do at the beginning of your node startup. The rest of > your code is more or less governed by your transitions, but if there are > things you need to do on the side, there is nothing in Helix preventing you > from doing so. See > http://helix.apache.org/0.6.3-docs/tutorial_participant.html for > participant logic. > > 4) The current state is per-instance and is literally called > CurrentState. For a given participant, you can query a current state by > doing something like: > > HelixDataAccessor accessor = helixManager.getHelixDataAccessor(); > CurrentState currentState = > accessor.getProperty(accessor.keyBuilder().currentState(instanceName, > sessionId, resourceName); > > If you implement a user-defined rebalancer as above, we automatically > aggregate all these current states into a CurrentStateOutput object. > > 5) You can use a Helix spectator: > > http://helix.apache.org/0.6.3-docs/tutorial_spectator.html > > This basically gives you a live-updating routing table for the mappings > of the Helix-managed resource. However, it requires the external view to be > up to date, going back to my other point of perhaps separating the concept > of changing mappings every 30 minutes from the frequency at which the > controller runs. > > Hopefully this helps. > > Kanak > > ------------------------------ > Date: Thu, 31 Jul 2014 12:13:27 -0700 > Subject: Questions about custom helix rebalancer/controller/agent > From: [email protected] > To: [email protected] > > > Hi, > > I am trying to write a customized rebalancing algorithm. I would like to > run the rebalancer every 30 minutes inside a single thread. I would also > like to completely disable Helix triggering the rebalancer. > > I have a few questions: > 1) What's the best way to run the custom controller ? Can I simply > instantiate a ZKHelixAdmin object and then keep running my rebalancer > inside a thread or do I need to do something more. > > Apart from rebalancing, I want to do other things inside the the > controller, so it would be nice if I could simply fire up the controller > through code. I could not find this in the documentation. > > 2) Same question for the Helix agent. My Helix Agent is a JVM process > which does other things apart from exposing the callbacks for state > transitions. Is there a code sample for the same ? > > 3) How do I disable Helix triggered rebalancing once I am able to run > the custom controller ? > > 4) During my custom rebalance run, how I can get the current cluster > state - is it through ClusterDataCache.getIdealState() ? > > 5) For clients talking to the cluster, does helix provide an easy > abstraction to find the partition distribution for a helix resource ? > > Thanks > > > > > > > > > > >
