I was thinking not so much about the decomposition as much as the types of problems to be solved with custom rebalancers. Here are a few that come to mind that would be interesting test-cases. Would be nice if these are relatively easy to implement, as these would require custom logic, and it would be super useful for many systems.
- Bob - balance due to a hot spot. The idea would be to split the load so the hot spot is cut down to size. - balance due to workload patterns, such as some partitions hot during the day and others at night, say because of US vs non-US traffic, or read-heavy by day and write-heavy by night as updates are computed offline and reloaded in bulk - change balancing for isolation in a multi-tenant situation > Here is my two cents. Currently we are mostly running rebalancer inside > controller pipeline, so the rebalancer is triggered by Zookeeper change > notifications and it gets a free copy of Zookeeper cluster data snapshot. > However, rebalancer may also be triggered in other ways like timers, > system load changes, or any external signals. In addition, rebalancer may > also need to access data from some monitoring systems, or external > services like MySQL. > > > We could probably separate the logic of rebalancer from controller. > Rebalancer is all about setting ideal-state; i.e. set the target mappings > of partition-->(host, state). The rebalancer logic will be mostly > application specific. On the other hand, controller is all about bringing > current-state to ideal-state. The controller logic includes using a > semi-greedy algorithm (e.g. shortest path) to calculate the next mappings > from current mapping (i.e. current-state) to target mapping (i.e. > ideal-state), applying constraints, figuring out optimal parallism, etc. > The controller logic will be mostly generic to all applications. The only > protocol between rebalancer and controller is ideal-state (i.e. the > target partiton-->(host, state) mappings). In this sense, every rebalancer > is customized, and we can provide some default implementations like auto > or semi-auto ones. Rebalancer can be also running anywhere provided that > there is only one instance running. This can be achieved through leader > election, running with controller, or use custom-code invoker. > > On Fri, Oct 10, 2014 at 12:25 PM, kishore g <[email protected]> wrote: > > >> Hi, >> >> >> Even though we have couple of ways of writing custom rebalancer (one on >> participant side and another on the controller side), I dont think its >> trivial for some one to write them without understanding all the >> internal details of Helix. >> >> I am starting this thread to see if others have any thoughts on making >> it easier for some to get started and write their own rebalancer as part >> of the quick start. >> >> thanks, Kishore G >> >> >
