[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12719641#action_12719641
 ] 

Henry Robinson commented on ZOOKEEPER-107:
------------------------------------------

Hi - 

Thanks for the proposal - it does a really good job of framing the important 
questions. 

I am in favour of a solution that uses ZAB and the existing consensus framework 
for dynamic group membership. I believe this can be achieved without an 
out-of-band protocol or significant changes to the way the current protocols 
work; this has the advantage of keeping things simple. 

I'm not certain I've read your proposal correctly, but it seems that step 6 has 
followers commit the CONFIGCHANGE proposal on receipt, rather than waiting for 
a COMMIT message. By my understanding of ZAB, this means there is a possibility 
where fewer than a quorum of followers will commit this proposal, if the leader 
fails halfway through sending the proposal messages, leading to the possibility 
of divergent histories at followers. 

The tool approach is one way of wrapping up the authentication required if an 
ensemble wishes to restrict those nodes that can join it. Currently there is 
some implicit authentication done as the leader only establishes connections 
with followers that belong to the static membership. However there's certainly 
a need, as a result of this JIRA, for a better authentication mechanism inside 
ZK. I see this as orthogonal to the mechanisms required to do dynamic 
membership.

I suggest that we simply augment the current ZooKeeper protocol with four new 
proposals: NEWVIEW, GETVIEW, JOIN and LEAVE. NEWVIEW proposes an entirely new 
view, and may aggregate many JOIN or LEAVE proposals into one. Since NEWVIEW 
likely requires knowledge of the current view, GETVIEW returns the current view 
and its version. JOIN and LEAVE incrementally change the current view, whatever 
it is, and so do not require a GETVIEW call to establish the current view. 

All proposals go through the usual ZAB two-phase protocol, except for the fact 
that the leader coordinating the current ZAB instance must wait for 
acknowledgements from quorums in both the current and new view before 
committing the change. 

It's possible that this can lead to the proposal blocking if a quorum cannot be 
assembled in either view. Although it might seem an error that the proposal 
will block even if a quorum in the current view can be established, the same 
behaviour would be observed even if the proposal could be committed - all 
subsequent proposals would require a quorum from the new view and would block. 

If an ensemble is currently blocked due to the failure of n/2 + 1 nodes, it is 
not possible to resume progress by issuing a LEAVE on behalf of the failed 
nodes; however in general failed nodes may both JOIN and LEAVE the ensemble. 

If a leader election is required during a proposal, there are no correctness 
issues assuming the current required invariants of ZAB leader election hold. In 
particular, as long as the new leader has seen the most recent proposals then 
the view change proposal will be committed once the new leader is elected. This 
property will be maintained without changes to the current leader election 
protocols - as the view change proposal will have been seen by a quorum from 
the current view, the new leader is guaranteed to have a record of the 
proposal. 

A node that fails after it has issued a join proposal, but before it hears of 
its success must be able to find the status of the proposal once it recovers. 
There are several ways to do this. 

I have some sketches of correctness proofs for this and could produce a more 
detailed design document if required - however, if there's consensus that this 
is the right approach I'd rather get coding :) It turns out after much 
agonising that ZK's existent invariants are already pretty much strong enough 
to build this protocol. The only extension is the requirement to listen for two 
different sets of quorum acknowledgements.

I've deliberately avoided the issue of exposing the view to the outside world 
(although this requires attention, as new nodes need to be able to find the 
ensemble!) - I have outlined some ideas earlier in this JIRA and I know other 
people have good suggestions, but I think we can solve both issues 
independently. 

Would love to hear comments, things that I've missed, errors in logic etc.

> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-107
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Patrick Hunt
>         Attachments: SimpleAddition.rtf
>
>
> Currently cluster membership is statically defined, adding/removing hosts 
> to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to