Mahadev Konar wrote:
The general scenario I was interested in was a minority of servers
losing state, and trying to understand what other correlated events
could cause issues. Just to be clear, since A has sent the commit to B
(or is it when A has got its own commit), it *could have* sent a success
back to the client before everything went down, correct?
Here is what would happen in the scenario you mentioned.
Great - thanks Mahadev.
Not to drag this on more than necessary, please bear with me for one
more example of 'amnesia' that comes to mind. I have a set of ZooKeeper
servers A, B, C.
- C is currently not running, A is the leader, B is the follower.
- A proposes zxid1 to A and B, both acknowledge.
- A asks A to commit (which it persists), but before the same commit
request reaches B, all servers go down (say a power failure).
In this case, the zookeeper protocol says that zxid1 would be available only
if the client gets a success. So zxid1 may or may not get committed if A and
B come up later. ( this is a different scenario then what you mention
- Later, B and C come up (A is slow to reboot), but B has lost all state
due to disk failure.
This is how zookeeper would work in this scenario ---
Now since we have B and C come up and B has the most recent state but loses
it, then zookeeper is clueless about this. So C would say I have the some
zxid say zxid-n and B would say that I have zxid = 0 (since its stateless)
and C would become a leader (since it has the highest zxid).
This would lead to loss of data and loss of state in zookeeper. That's what
I meant when I mentioned that zookeeper relies heavily on the state being
persisted on disk.
OK good, my understanding was correct then.
I wasn't aware that C would ask A to truncate even committed
transactions (the zookeeper internals doc/slides talks about proposals -
I suspect I may have some terminology confusion here). Another
possibility is C is now at zxid2 >= zxid1, in which case A could
possibly *not* get rid of the committed transaction?
- C becomes the new leader and perhaps continues with some more new
Now if A comes back again, C would say that its the leader and ask A to
truncate all the transactions that A had to come to sync with C.
Yes thanks. Not sure if this makes sense, but is it worthwhile to have a
'safe' mode when a server comes up with no state (I think it should be
simple to distinguish between having a clean disk 'no state'/corrupt
state and 'empty state')? In this case, I think it could simply wait
till it sees a successful propose/commit cycle to know that it is safe
for it to take a snapshot and start participating in the ensemble.
Again, you can see that how persistence loss can trigger state loss in
zookeeper. If its just minority of servers failing then this can be taken
care of by zookeeper but in this scenario is C failing and then being
brought up with an inconsisten state with another failure of A and data loss
of B -- which zookeeper cannot handle.
I hope this helps.
In the scenario I previously described, when B and C comes up, B would
not respond to C, but just watch - C would not be able to establish
quorum until A came up; at which point B has witnessed a successful
leader activation, and can join. If one is willing to sacrifice liveness
for safety in situations where 1 or more nodes have amnesia, would this
be a viable option?