Thomas, in the scenario you give you have two simultaneous failures with 3 nodes, so it will not recover correctly. A is failed because it is not up. B has failed because it lost all its data.
it would be good for ZooKeeper to not come up in that scenario. perhaps what we need is something similar to your safe state proposal. basically a server that has forgotten everything should not be allowed to vote in the leader election. that would avoid your scenario. we just need to put a flag file in the data directory to say that the data is valid and thus can vote. ben ________________________________________ From: thomas.john...@sun.com [thomas.john...@sun.com] Sent: Tuesday, December 16, 2008 4:02 PM To: email@example.com Subject: Re: What happens when a server loses all its state? Mahadev Konar wrote: > Hi Thomas, > > > > >> More generally, is it a safe assumption to make that the ZooKeeper >> service will maintain all its guarantees if a minority of servers lose >> persistent state (due to bad disks, etc) and restart at some point in >> the future? >> > Yes that is true. > > Great - thanks Mahadev. Not to drag this on more than necessary, please bear with me for one more example of 'amnesia' that comes to mind. I have a set of ZooKeeper servers A, B, C. - C is currently not running, A is the leader, B is the follower. - A proposes zxid1 to A and B, both acknowledge. - A asks A to commit (which it persists), but before the same commit request reaches B, all servers go down (say a power failure). - Later, B and C come up (A is slow to reboot), but B has lost all state due to disk failure. - C becomes the new leader and perhaps continues with some more new transactions. Likely I'm misunderstanding the protocol, but have I effectively lost zxid1 at this point? What would happen when A comes back up? Thanks.