in the scenario you give you have two simultaneous failures with 3 nodes, so it
will not recover correctly. A is failed because it is not up. B has failed
because it lost all its data.
it would be good for ZooKeeper to not come up in that scenario. perhaps what we
need is something similar to your safe state proposal. basically a server that
has forgotten everything should not be allowed to vote in the leader election.
that would avoid your scenario. we just need to put a flag file in the data
directory to say that the data is valid and thus can vote.
From: thomas.john...@sun.com [thomas.john...@sun.com]
Sent: Tuesday, December 16, 2008 4:02 PM
Subject: Re: What happens when a server loses all its state?
Mahadev Konar wrote:
> Hi Thomas,
>> More generally, is it a safe assumption to make that the ZooKeeper
>> service will maintain all its guarantees if a minority of servers lose
>> persistent state (due to bad disks, etc) and restart at some point in
>> the future?
> Yes that is true.
Great - thanks Mahadev.
Not to drag this on more than necessary, please bear with me for one
more example of 'amnesia' that comes to mind. I have a set of ZooKeeper
servers A, B, C.
- C is currently not running, A is the leader, B is the follower.
- A proposes zxid1 to A and B, both acknowledge.
- A asks A to commit (which it persists), but before the same commit
request reaches B, all servers go down (say a power failure).
- Later, B and C come up (A is slow to reboot), but B has lost all state
due to disk failure.
- C becomes the new leader and perhaps continues with some more new
Likely I'm misunderstanding the protocol, but have I effectively lost
zxid1 at this point? What would happen when A comes back up?