How to handle total Zookeeper restart

Ming Fang Sun, 03 Mar 2013 20:58:46 -0800

Hi

When I have a working Helix cluster, all participants for working fine, and for 
whatever reason I lost the entire Zookeeper cluster(including all state),
what is the best way to handle this?


Ideally I want all the participants to continue working and that the only 
capability I would loose is Helix's ability to failover.
Upon restart of Zookeeper, the Controllers and Participants should register 
their latest state back to the new Zookeeper cluster.
However my tests thus far shows that even thought the HelixManager reconnects, 
they do not write the necessary data into Zookeeper for the cluster to function 
correctly.
For example, the external view callbacks are not showing the participants at 
all.

Is this something Helix should handle or is it up to the applications to detect 
the failure and then recreate new HelixManagers?

Thanks
--ming

How to handle total Zookeeper restart

Reply via email to