According to your explanation, "*ZAB makes the guarantee that a proposal
which has been logged by
a quorum of followers will eventually be committed*" , however, the source
code of Zookeeper, the FastLeaderElection.java file, shows that, in the
election, the candidates only provide their zxid in the votes, the one with
the max zxid would win the election. I mean, it seems that no check has
made to make sure whether the latest proposal has been logged by a quorum
In this situation, the zookeeper would deliver a proposal, which is known
a failed one by the client. Imagine this scenario, a zookeeper cluster with
5 servers, Leader only receives 1 ack for proposal A, after a timeout, the
client is told that the proposal failed. At this time, all servers restart
due to a power failure. The server have the log of proposal A would be the
leader, however, the client is told the proposal A failed.
Do I misunderstand this?
On Wed, Jan 27, 2010 at 10:37 AM, Henry Robinson <he...@cloudera.com>
That part of the documentation is slightly confusing. The elected leader
must have the highest zxid that has been written to disk by a quorum of
followers. ZAB makes the guarantee that a proposal which has been logged
a quorum of followers will eventually be committed. Conversely, any
proposals that *don't* get logged by a quorum before the leader sending
dies will not be committed. One of the ZAB papers covers both these
situations - making sure proposals are committed or skipped at the right
So you get the neat property that leader election can be live in exactly
case where the ZK cluster is live. If a quorum of peers aren't available
elect the leader, the resulting cluster won't be live anyhow, so it's ok
leader election to fail.
FLP impossibility isn't actually strictly relevant for ZAB, because FLP
requires that message reordering is possible (see all the stuff in that
paper about non-deterministically drawing messages from a potentially
deliverable set). TCP FIFO channels don't reorder, so provide the extra
signalling that ZAB requires.
2010/1/26 Qing Yan <qing...@gmail.com>
I have question about how zookeeper *remembers* a commit operation.
The leader will issue a COMMIT to all followers as soon as a quorum of
followers have ACKed a message. Since messages are ACKed in order,
will be sent by the leader as received by the followers in order.
COMMITs are processed in order. Followers deliver a proposals message
that proposal is committed.
My question is will leader wait for COMMIT to be processed by quorum
of followers before consider
COMMIT to be success? From the documentation it seems that leader
COMMIT asynchronously and
don't expect confirmation from followers. In the extreme case, what
if leader issue a COMMIT
to all followers and crash immediately before the COMMIT message can go
of the network. How the system
remembers the COMMIT ever happens?
Actually this is related to the leader election process:
ZooKeeper messaging doesn't care about the exact method of electing a
has long as the following holds:
The leader has seen the highest zxid of all the followers.
A quorum of servers have committed to following the leader.
Of these two requirements only the first, the highest zxid amoung the
followers needs to hold for correct operation.
Is there a liveness issue try to find "The leader has seen the highest
of all the followers"? What if some of the followers (which happens to
holding the highest zxid) cannot be contacted(FLP impossible result?)
It will be more striaghtforward if COMMIT requires confirmation from a
quorum of the followers. But I guess things get
optimized according to Zab's FIFO nature...just want to hear some
clarification about it.
Made in Zhejiang University