Henry Robinson commented on ZOOKEEPER-107:
Thanks for the proposal - it does a really good job of framing the important
I am in favour of a solution that uses ZAB and the existing consensus framework
for dynamic group membership. I believe this can be achieved without an
out-of-band protocol or significant changes to the way the current protocols
work; this has the advantage of keeping things simple.
I'm not certain I've read your proposal correctly, but it seems that step 6 has
followers commit the CONFIGCHANGE proposal on receipt, rather than waiting for
a COMMIT message. By my understanding of ZAB, this means there is a possibility
where fewer than a quorum of followers will commit this proposal, if the leader
fails halfway through sending the proposal messages, leading to the possibility
of divergent histories at followers.
The tool approach is one way of wrapping up the authentication required if an
ensemble wishes to restrict those nodes that can join it. Currently there is
some implicit authentication done as the leader only establishes connections
with followers that belong to the static membership. However there's certainly
a need, as a result of this JIRA, for a better authentication mechanism inside
ZK. I see this as orthogonal to the mechanisms required to do dynamic
I suggest that we simply augment the current ZooKeeper protocol with four new
proposals: NEWVIEW, GETVIEW, JOIN and LEAVE. NEWVIEW proposes an entirely new
view, and may aggregate many JOIN or LEAVE proposals into one. Since NEWVIEW
likely requires knowledge of the current view, GETVIEW returns the current view
and its version. JOIN and LEAVE incrementally change the current view, whatever
it is, and so do not require a GETVIEW call to establish the current view.
All proposals go through the usual ZAB two-phase protocol, except for the fact
that the leader coordinating the current ZAB instance must wait for
acknowledgements from quorums in both the current and new view before
committing the change.
It's possible that this can lead to the proposal blocking if a quorum cannot be
assembled in either view. Although it might seem an error that the proposal
will block even if a quorum in the current view can be established, the same
behaviour would be observed even if the proposal could be committed - all
subsequent proposals would require a quorum from the new view and would block.
If an ensemble is currently blocked due to the failure of n/2 + 1 nodes, it is
not possible to resume progress by issuing a LEAVE on behalf of the failed
nodes; however in general failed nodes may both JOIN and LEAVE the ensemble.
If a leader election is required during a proposal, there are no correctness
issues assuming the current required invariants of ZAB leader election hold. In
particular, as long as the new leader has seen the most recent proposals then
the view change proposal will be committed once the new leader is elected. This
property will be maintained without changes to the current leader election
protocols - as the view change proposal will have been seen by a quorum from
the current view, the new leader is guaranteed to have a record of the
A node that fails after it has issued a join proposal, but before it hears of
its success must be able to find the status of the proposal once it recovers.
There are several ways to do this.
I have some sketches of correctness proofs for this and could produce a more
detailed design document if required - however, if there's consensus that this
is the right approach I'd rather get coding :) It turns out after much
agonising that ZK's existent invariants are already pretty much strong enough
to build this protocol. The only extension is the requirement to listen for two
different sets of quorum acknowledgements.
I've deliberately avoided the issue of exposing the view to the outside world
(although this requires attention, as new nodes need to be able to find the
ensemble!) - I have outlined some ideas earlier in this JIRA and I know other
people have good suggestions, but I think we can solve both issues
Would love to hear comments, things that I've missed, errors in logic etc.
> Allow dynamic changes to server cluster membership
> Key: ZOOKEEPER-107
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
> Project: Zookeeper
> Issue Type: Improvement
> Components: server
> Reporter: Patrick Hunt
> Attachments: SimpleAddition.rtf
> Currently cluster membership is statically defined, adding/removing hosts
> to/from the server cluster dynamically needs to be supported.
This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.