RE: What happens when a server loses all its state?

Krishna Sankar (ksankar) Wed, 17 Dec 2008 12:22:13 -0800

Just as a supporting note, from what I read, to support n simultaneous
failures we need 2n+1 nodes. In this case, we need 5 nodes to operate
correctly. Might be a good idea to capture this formula and if more than
n failures occur, write the appropriate flags which can then be used for
the right recovery state.


Cheers
<k/>  

|-----Original Message-----
|From: Benjamin Reed [mailto:[email protected]]
|Sent: Wednesday, December 17, 2008 11:48 AM
|To: [email protected]
|Subject: RE: What happens when a server loses all its state?
|
|Thomas,
|
|in the scenario you give you have two simultaneous failures with 3
|nodes, so it will not recover correctly. A is failed because it is not
|up. B has failed because it lost all its data.
|
|it would be good for ZooKeeper to not come up in that scenario. perhaps
|what we need is something similar to your safe state proposal.
basically
|a server that has forgotten everything should not be allowed to vote in
|the leader election. that would avoid your scenario. we just need to
put
|a flag file in the data directory to say that the data is valid and
thus
|can vote.
|
|ben
|________________________________________
|From: [email protected] [[email protected]]
|Sent: Tuesday, December 16, 2008 4:02 PM
|To: [email protected]
|Subject: Re: What happens when a server loses all its state?
|
|Mahadev Konar wrote:
|> Hi Thomas,
|>
|>
|>
|>
|>> More generally, is it a safe assumption to make that the ZooKeeper
|>> service will maintain all its guarantees if a minority of servers
|lose
|>> persistent state (due to bad disks, etc) and restart at some point
in
|>> the future?
|>>
|> Yes that is true.
|>
|>
|Great - thanks Mahadev.
|
|Not to drag this on more than necessary, please bear with me for one
|more example of 'amnesia' that comes to mind. I have a set of ZooKeeper
|servers A, B, C.
|- C is currently not running, A is the leader, B is the follower.
|- A proposes zxid1 to A and B, both acknowledge.
|- A asks A to commit (which it persists), but before the same commit
|request reaches B, all servers go down (say a power failure).
|- Later, B and C come up (A is slow to reboot), but B has lost all
state
|due to disk failure.
|- C becomes the new leader and perhaps continues with some more new
|transactions.
|
|Likely I'm misunderstanding the protocol, but have I effectively lost
|zxid1 at this point? What would happen when A comes back up?
|
|Thanks.

RE: What happens when a server loses all its state?

Reply via email to