[ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12706219#action_12706219 ]
Henry Robinson commented on ZOOKEEPER-107: ------------------------------------------ I agree with pretty much everything I've read here, (in particular, the importance of getting consensus!), but wanted to clarify my initial comment a bit. Rather than choose between strategies 1 and 2 as outlined by Benjamin, I think there's a hybrid approach needed. If a node is a member of a quorate cluster, then the most up to date membership information should be available to it in a znode. I think this is the most elegant approach, and is trivially achieved by pushing join/leave requests through the atomic broadcast pipeline. If a node is joining the cluster, it needs to be able to bootstrap the location of the cluster from somewhere. There therefore needs to be a externally available resource containing a list of machines in the cluster that is at least accurate for one machine (as a joining node will try all servers in that list in turn). When I say available at some URI, this is what I mean. Currently, this information is kept statically at a URI that addresses conf/zoo.cfg on the local filesystem. I suggest generalising that to a general URI. One nice property is that it then does not tie a cluster to a particular machine, as the URI provides a level of indirection. It is then the cluster administrator's responsibility to keep this URI up-to-date (although of course this should be automated), possibly via a client that just pulls membership information from the cluster periodically. As I said earlier, it's only important for the contents of this list to have one node in common with the true membership of the cluster, so it's allowed to get a bit out of sync. We can certainly easily imagine ways that ZK can help here. Of course the URI must be highly available, but it also has to exist, otherwise we could have 'orphaned' clusters that are running on machines whose identity we don't necessarily know. The URI can be a front for almost any scheme we like - periodic heartbeating of live nodes is one. The format of this file can be anything at all - from a serialised snapshot to a list of ip:port pairs, as long as it contains enough information for a client to find the cluster. Personally I would prefer human readable, simple formats. To talk about recovery for a moment: when a node recovers from a crash and rejoins the cluster, it can help the cluster elect a master if the cluster is current non-quorate. This is because it was originally part of the cluster, and therefore the protocol guarantees that a quorum of nodes including the recovering one will have seen all committed proposals (this is important to correctness). If the node was not originally a member of the cluster, it must not help get a master elected as it cannot be part of a quorum. Similarly, a node cannot query the cluster to find out if it was originally a member because the quorum required to do so might not exist. Therefore every node that ever successfully joins a cluster must store this fact in its own persistent storage, as only it can know whether it is permitted to help run the election. Finally, the startup problem. Given a URI, nodes can bootstrap themselves onto a cluster simply by being told to start in startup mode. Alternatively, a single node can be distinguished (again, in the URI contents perhaps) which will start in single-node mode and process join requests one-by-one. > Allow dynamic changes to server cluster membership > -------------------------------------------------- > > Key: ZOOKEEPER-107 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107 > Project: Zookeeper > Issue Type: Improvement > Components: server > Reporter: Patrick Hunt > > Currently cluster membership is statically defined, adding/removing hosts > to/from the server cluster dynamically needs to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.