hi, we are trying to use Helix to manage our clusters ( 300+ nodes) and we now have a problem, please help!
let me describe it. our clusters is made up of servers and clients; servers are partitioned into groups ( partition in Helix) and clients are partitioned to accordingly; now we are trying to do some fault-tolerant thing like this: 1) if one server-node fails, trigger a state transition (server site) , do something like print log, trigger alarm and restart the server process; 2)then, trigger some state transition on all client-nodes belonging this partition, do something like kick the fail-server and release the fail server's resource on client; could someone please tell me how to inplement this using Helix, thanks! it'll be better if you could show me some code samplesl thanks!
