Hi Jiangjie, A few things we need to make clear when using Helix to manage your cluster.
1. What is the resource and how it is partitioned. Based on your description, the resource seems to be a set of machines (servers and clients). 2. Who host the resource. Helix is about resource assignment in distributed systems. For example, if you have a database, it may be partitioned and hosted by a set of nodes. In your case, it’s not clear who host the resource. 3. What is the state model you are going to use? 4. Failure handing. In your description, if a server fails, a state transition will be triggered on both servers and clients. It’s not clear which server should receive the notification Once we are clear on these, it should be fairly straightforward to use Helix. You may also be interested in looking at a few simple examples under the recipes folder (https://github.com/apache/helix/tree/master/recipes). Thanks, Jason From: jianjie feng <[email protected]<mailto:[email protected]>> Reply-To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Date: Thursday, January 15, 2015 at 7:57 PM To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: HELP hi, we are trying to use Helix to manage our clusters ( 300+ nodes) and we now have a problem, please help! let me describe it. our clusters is made up of servers and clients; servers are partitioned into groups ( partition in Helix) and clients are partitioned to accordingly; now we are trying to do some fault-tolerant thing like this: 1) if one server-node fails, trigger a state transition (server site) , do something like print log, trigger alarm and restart the server process; 2)then, trigger some state transition on all client-nodes belonging this partition, do something like kick the fail-server and release the fail server's resource on client; could someone please tell me how to inplement this using Helix, thanks! it'll be better if you could show me some code samplesl thanks!
