[ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12719641#action_12719641 ]
Henry Robinson commented on ZOOKEEPER-107: ------------------------------------------ Hi - Thanks for the proposal - it does a really good job of framing the important questions. I am in favour of a solution that uses ZAB and the existing consensus framework for dynamic group membership. I believe this can be achieved without an out-of-band protocol or significant changes to the way the current protocols work; this has the advantage of keeping things simple. I'm not certain I've read your proposal correctly, but it seems that step 6 has followers commit the CONFIGCHANGE proposal on receipt, rather than waiting for a COMMIT message. By my understanding of ZAB, this means there is a possibility where fewer than a quorum of followers will commit this proposal, if the leader fails halfway through sending the proposal messages, leading to the possibility of divergent histories at followers. The tool approach is one way of wrapping up the authentication required if an ensemble wishes to restrict those nodes that can join it. Currently there is some implicit authentication done as the leader only establishes connections with followers that belong to the static membership. However there's certainly a need, as a result of this JIRA, for a better authentication mechanism inside ZK. I see this as orthogonal to the mechanisms required to do dynamic membership. I suggest that we simply augment the current ZooKeeper protocol with four new proposals: NEWVIEW, GETVIEW, JOIN and LEAVE. NEWVIEW proposes an entirely new view, and may aggregate many JOIN or LEAVE proposals into one. Since NEWVIEW likely requires knowledge of the current view, GETVIEW returns the current view and its version. JOIN and LEAVE incrementally change the current view, whatever it is, and so do not require a GETVIEW call to establish the current view. All proposals go through the usual ZAB two-phase protocol, except for the fact that the leader coordinating the current ZAB instance must wait for acknowledgements from quorums in both the current and new view before committing the change. It's possible that this can lead to the proposal blocking if a quorum cannot be assembled in either view. Although it might seem an error that the proposal will block even if a quorum in the current view can be established, the same behaviour would be observed even if the proposal could be committed - all subsequent proposals would require a quorum from the new view and would block. If an ensemble is currently blocked due to the failure of n/2 + 1 nodes, it is not possible to resume progress by issuing a LEAVE on behalf of the failed nodes; however in general failed nodes may both JOIN and LEAVE the ensemble. If a leader election is required during a proposal, there are no correctness issues assuming the current required invariants of ZAB leader election hold. In particular, as long as the new leader has seen the most recent proposals then the view change proposal will be committed once the new leader is elected. This property will be maintained without changes to the current leader election protocols - as the view change proposal will have been seen by a quorum from the current view, the new leader is guaranteed to have a record of the proposal. A node that fails after it has issued a join proposal, but before it hears of its success must be able to find the status of the proposal once it recovers. There are several ways to do this. I have some sketches of correctness proofs for this and could produce a more detailed design document if required - however, if there's consensus that this is the right approach I'd rather get coding :) It turns out after much agonising that ZK's existent invariants are already pretty much strong enough to build this protocol. The only extension is the requirement to listen for two different sets of quorum acknowledgements. I've deliberately avoided the issue of exposing the view to the outside world (although this requires attention, as new nodes need to be able to find the ensemble!) - I have outlined some ideas earlier in this JIRA and I know other people have good suggestions, but I think we can solve both issues independently. Would love to hear comments, things that I've missed, errors in logic etc. > Allow dynamic changes to server cluster membership > -------------------------------------------------- > > Key: ZOOKEEPER-107 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107 > Project: Zookeeper > Issue Type: Improvement > Components: server > Reporter: Patrick Hunt > Attachments: SimpleAddition.rtf > > > Currently cluster membership is statically defined, adding/removing hosts > to/from the server cluster dynamically needs to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.