Hi Varun,
Sorry for the delay.
1 and 3) There are a number of ways to do this, with various tradeoffs.
- You can write a user-defined rebalancer. In helix 0.6.x, it involves
implementing the following interface:
https://github.com/apache/helix/blob/helix-0.6.x/helix-core/src/main/java/org/apache/helix/controller/rebalancer/Rebalancer.java
Essentially what it does is given an existing ideal state, compute a new ideal
state. For 0.6.x, this will read the preference lists in the output ideal state
and compute a state mapping based on them. If you need more control, you can
also implement:
https://github.com/apache/helix/blob/helix-0.6.x/helix-core/src/main/java/org/apache/helix/controller/rebalancer/internal/MappingCalculator.java
which will allow you to create a mapping from partition to map of participant
and state. In 0.7.x, we consolidated these into a single method.
Here is a tutorial on the user-defined rebalancer:
http://helix.apache.org/0.6.3-docs/tutorial_user_def_rebalancer.html
Now, running this every 30 minutes is tricky because by default the controller
responds to all cluster events (and really it needs to because it aggregates
all participant current states into the external view -- unless you don't care
about that).
- Combined with the user-defined rebalancer (or not), you can have a
GenericHelixController that doesn't listen on any events, but calls
startRebalancingTimer(), into which you can pass 30 minutes. The problem with
this is that the instructions at
http://helix.apache.org/0.6.3-docs/tutorial_controller.html won't work as
described because of a known issue. The workaround is to connect HelixManager
as role ADMINISTRATOR instead of CONTROLLER.
However, if you connect as ADMINISTRATOR, you have to set up leader election
yourself (assuming you want a fault-tolerant controller). See
https://github.com/apache/helix/blob/helix-0.6.x/helix-core/src/main/java/org/apache/helix/manager/zk/DistributedLeaderElection.java
for a controller change listener that can do leader election, but your version
will have to be different, as you actually don't want to add listeners, but
rather set up a timer.
This also gives you the benefit of plugging in your own logic into the
controller pipeline. See
https://github.com/apache/helix/blob/helix-0.6.x/helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java
createDefaultRegistry() for how to create an appropriate PipelineRegistry.
- You can take a completely different approach and put your ideal state in
CUSTOMIZED rebalance mode. Then you can have a meta-resource where one
participant is a leader and the others are followers (you can create an ideal
state in SEMI_AUTO mode, where the replica count and the replica count and
preference list of resourceName_0 is "ANY_LIVEINSTANCE". When one participant
is told to become leader, you can set a timer for 30 minutes and update and
write the map fields of the ideal state accordingly.
2) I'm not sure I understand the question. If you're in the JVM, you simply
need to connect as a PARTICIPANT for your callbacks, but that can just be
something you do at the beginning of your node startup. The rest of your code
is more or less governed by your transitions, but if there are things you need
to do on the side, there is nothing in Helix preventing you from doing so. See
http://helix.apache.org/0.6.3-docs/tutorial_participant.html for participant
logic.
4) The current state is per-instance and is literally called CurrentState. For
a given participant, you can query a current state by doing something like:
HelixDataAccessor accessor = helixManager.getHelixDataAccessor();CurrentState
currentState =
accessor.getProperty(accessor.keyBuilder().currentState(instanceName,
sessionId, resourceName);
If you implement a user-defined rebalancer as above, we automatically aggregate
all these current states into a CurrentStateOutput object.
5) You can use a Helix spectator:
http://helix.apache.org/0.6.3-docs/tutorial_spectator.html
This basically gives you a live-updating routing table for the mappings of the
Helix-managed resource. However, it requires the external view to be up to
date, going back to my other point of perhaps separating the concept of
changing mappings every 30 minutes from the frequency at which the controller
runs.
Hopefully this helps.
Kanak
Date: Thu, 31 Jul 2014 12:13:27 -0700
Subject: Questions about custom helix rebalancer/controller/agent
From: [email protected]
To: [email protected]
Hi,
I am trying to write a customized rebalancing algorithm. I would like to run
the rebalancer every 30 minutes inside a single thread. I would also like to
completely disable Helix triggering the rebalancer.
I have a few questions:1) What's the best way to run the custom controller ?
Can I simply instantiate a ZKHelixAdmin object and then keep running my
rebalancer inside a thread or do I need to do something more.
Apart from rebalancing, I want to do other things inside the the controller, so
it would be nice if I could simply fire up the controller through code. I could
not find this in the documentation.
2) Same question for the Helix agent. My Helix Agent is a JVM process which
does other things apart from exposing the callbacks for state transitions. Is
there a code sample for the same ?
3) How do I disable Helix triggered rebalancing once I am able to run the
custom controller ?
4) During my custom rebalance run, how I can get the current cluster state - is
it through ClusterDataCache.getIdealState() ?
5) For clients talking to the cluster, does helix provide an easy abstraction
to find the partition distribution for a helix resource ?
Thanks