Hi Ming, Helix depends on the data in zookeeper. Its ok for zookeeper to restart and Helix will handle it but if zookeeper loses its state( data directory) then unfortunately we cannot recover the state.
How did you lose the zookeeper cluster ( including state ). thanks, Kishore G On Sun, Mar 3, 2013 at 8:58 PM, Ming Fang <[email protected]> wrote: > Hi > > When I have a working Helix cluster, all participants for working fine, > and for whatever reason I lost the entire Zookeeper cluster(including all > state), > what is the best way to handle this? > > Ideally I want all the participants to continue working and that the only > capability I would loose is Helix's ability to failover. > Upon restart of Zookeeper, the Controllers and Participants should > register their latest state back to the new Zookeeper cluster. > However my tests thus far shows that even thought the HelixManager > reconnects, they do not write the necessary data into Zookeeper for the > cluster to function correctly. > For example, the external view callbacks are not showing the participants > at all. > > Is this something Helix should handle or is it up to the applications to > detect the failure and then recreate new HelixManagers? > > Thanks > --ming
