Hi Vu, What mechanism did you chose to balance across multiple resources? I was planning to add a recipe/rebalancer to solve this use case.
thanks, Kishore G On Fri, Jan 3, 2014 at 1:19 PM, Vu Nguyen <[email protected]> wrote: > I contacted our platform team. They're making changes that will make the > hosts ip's more static, so I can use that and pass them to Helix. > > We'll have extra nodes allocated and available to pick up a task that was > previously assigned to a failed node. In addition, our usual AWS setup > will replace the failed node for us. But we'll still have the extra > standby nodes because that should be faster than waiting for the > replacement node--I think it tends to take few minutes or more. > > Thanks > > Vu > > > > On Fri, Jan 3, 2014 at 8:32 AM, kishore g <[email protected]> wrote: > >> Hi Vu, >> >> Currently, Helix does not have the ability to take zookeeper client from >> outside. Its possible to add that feature but I need to think more about >> the zookeeper state changes like disconnect/connect, session expiry etc. >> >> Looks like getting the zk host/ports from your platform and passing it to >> Helix is a possible option for now. Meanwhile, we will look into what it >> takes to accept a zookeeper client as input. >> >> Regarding the rebalancing for multiple resources, of the options Kanak >> provided, start with #2 first and then implement #1 using USER defined >> rebalancer. This functionality is generic enough that we can provide a >> default implemention in Helix or if you implement one we can add it to >> helix-core. >> >> Let us know if you need help on implementing a rebalancer that works >> across resources. >> >> Another question is what is the expected behavior when a node fails, will >> you have stand by nodes to pick up the task or assign it to a node that is >> already running another task. >> >> thanks, >> Kishore G >> >> >> >> On Fri, Jan 3, 2014 at 12:14 AM, Vu Nguyen <[email protected]> wrote: >> >>> The main issue is that we already have an infrastructure here for >>> ZooKeeper that has a separate mechanism for clients to discover the ZK >>> server hosts. That's provided by our platform team. So client >>> applications don't actually provide the ZooKeeper hosts at this point. I >>> likely could get access to that information somehow, though. However, I >>> would prefer to re-use what our platform team provides in case they make >>> any modifications to how hosts are discovered. >>> >>> By using our platform libraries, we get a ZooKeeper client that's ready >>> to use directly. I was thinking that we could get Helix to use this for >>> any ZooKeeper operations. If we get disconnected from ZooKeeper, the >>> discovery mechanism would be re-used automatically for reconnecting without >>> requiring us to explicitly providing the hosts/ports. >>> >>> Thanks >>> >>> Vu >>> >>> >>> >>> >>> >>> >>> On Wed, Jan 1, 2014 at 9:26 PM, Kanak Biscuitwala >>> <[email protected]>wrote: >>> >>>> Not sure I follow. Is your problem that Helix creates the cluster as a >>>> child of the root node (e.g. /clusterName) while you would like it to be >>>> something else (e.g. /path/to/custom/root/clusterName)? >>>> >>>> I'm also unclear about what you mean about discovering ZK servers. How >>>> would you be able to leverage a path in ZK to discover ZK? >>>> >>>> Right now Helix requires long-running ZK servers and assumes that you >>>> as the application know how to connect to them (i.e. you know the >>>> hosts/ports). If that assumption holds, I believe it should work >>>> independent of deployment (cloud provider, private datacenter, or anything >>>> else). >>>> >>>> I'm not really sure what you're trying to adapt with the adapter. Could >>>> you clarify? >>>> >>>> I'm on #apachehelix on freenode if that's more convenient. >>>> >>>> Thanks, >>>> Kanak >>>> ------------------------------ >>>> Date: Wed, 1 Jan 2014 21:07:36 -0800 >>>> Subject: Re: helix rebalancing for multiple resources >>>> From: [email protected] >>>> To: [email protected] >>>> CC: [email protected] >>>> >>>> >>>> Yes, that is helpful. >>>> >>>> Another big requirement that I forgot to mention is running this on a >>>> cloud service provider, like AWS. We already have shared zookeeper setup >>>> there with our own client. Ideally, I could inject a custom client for >>>> helix to use for operations, where the main differences we would require is >>>> a custom top level path (/appname) that is required by our client, and that >>>> would handle discovering and connecting to the zookeeper servers. >>>> >>>> Is support for AWS and other cloud providers on the roadmap? >>>> >>>> Also, for the short-term, do you see any complications in us creating >>>> an adapter client that helix would use to bridge that gap? Or would it be >>>> much more complicated than I am hoping for? >>>> >>>> Thanks >>>> >>>> Vu >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Wed, Jan 1, 2014 at 8:36 PM, Kanak Biscuitwala >>>> <[email protected]>wrote: >>>> >>>> Resending since I realized you might not be registered on the user list >>>> yet. By the way, for your specific use case, I would personally lean >>>> towards the CustomCodeRunner along with the CUSTOMIZED IdealState rebalance >>>> mode. Then when nodes enter and exit, you can change the IdealState >>>> yourself and Helix will fire the transitions. This will most easily give >>>> you the policy-driven global view you're looking for. >>>> >>>> --- >>>> >>>> Hi Vu, >>>> >>>> Your understanding is basically correct. The controller will rebalance >>>> each resource in sequence, at most one controller pipeline execution is >>>> going on at any one time, and there is no parallelism within the controller >>>> pipeline (other than batch reading and writing the cluster at the beginning >>>> and end). >>>> >>>> Here are some things that may be of use to know: >>>> >>>> 1. You can plug in your own code to help decide how to rebalance your >>>> cluster in one of two ways: >>>> - Using the CustomCodeRunner on the participant side so that you can >>>> update the IdealState whenever the cluster changes: >>>> https://github.com/apache/incubator-helix/blob/helix-0.6.2-release/helix-core/src/main/java/org/apache/helix/participant/HelixCustomCodeRunner.java?source=c >>>> - Implementing a Rebalancer with USER_DEFINED rebalance mode: >>>> https://github.com/apache/incubator-helix/blob/helix-0.6.2-release/helix-core/src/main/java/org/apache/helix/controller/rebalancer/Rebalancer.java?source=c >>>> >>>> In either case, Helix will still fire transitions according to >>>> constraints and react to node entry/exit. >>>> >>>> 2. Helix supports adding tags to nodes (via InstanceConfig), and >>>> specifying tags in each resource IdealState. Then, a tagged resource will >>>> only be assigned to nodes with the corresponding tag present. >>>> >>>> 3. You can specify max partitions per resource per node in the >>>> IdealState of the resource (this should be 1 in your case) >>>> >>>> 4. You can combine any of the above 3 if that makes sense (e.g. change >>>> node tags whenever a cluster change happens, thus constraining how Helix >>>> will assign everything) >>>> >>>> Is that helpful? >>>> >>>> Kanak >>>> ------------------------------ >>>> Date: Wed, 1 Jan 2014 20:31:56 -0800 >>>> Subject: helix rebalancing for multiple resources >>>> From: [email protected] >>>> To: [email protected] >>>> >>>> >>>> Hi, >>>> We're looking into creating something like a distributed task >>>> processing cluster. We already have existing code for the processing task >>>> on a single host. So that results in stronger restrictions on what we're >>>> doing: >>>> - partitioned task A: single partition needs to be assigned to a single >>>> node and a node may have only a single partitioned task >>>> - another set of non-partitioned tasks (e.g. B, C, D) also needs to be >>>> assigned nodes, but it would be most efficient of those tasks are assigned >>>> to separate nodes so any single node has at most 1 task (either partitioned >>>> A, B, C, D, etc.) >>>> >>>> This seems to require a global view of a tasks. However, from the >>>> examples and the Rebalancer code, it appears that the resource >>>> mappings/assignments are independent of each another. Is that correct? If >>>> so, is Apache Helix the right framework for us, given the requirements >>>> above? >>>> >>>> I saw that it might be possible to find the current resource assignment >>>> for other resources during the rebalancing calculation methods, but I was >>>> then concerned about concurrency issues--if the rebalance for task A and >>>> rebalance for B was computed at the same time. >>>> >>>> Thanks for any and all feedback. >>>> >>>> Vu Nguyen >>>> >>>> >>>> >>> >> >
