[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12706219#action_12706219
 ] 

Henry Robinson commented on ZOOKEEPER-107:
------------------------------------------

I agree with pretty much everything I've read here, (in particular, the 
importance of getting consensus!), but wanted to clarify my initial comment a 
bit.

Rather than choose between strategies 1 and 2 as outlined by Benjamin, I think 
there's a hybrid approach needed. 

If a node is a member of a quorate cluster, then the most up to date membership 
information should be available to it in a znode. I think this is the most 
elegant approach, and is trivially achieved by pushing join/leave requests 
through the atomic broadcast pipeline.

If a node is joining the cluster, it needs to be able to bootstrap the location 
of the cluster from somewhere. There therefore needs to be a externally 
available resource containing a list of machines in the cluster that is at 
least accurate for one machine (as a joining node will try all servers in that 
list in turn). When I say available at some URI, this is what I mean. 
Currently, this information is kept statically at a URI that addresses 
conf/zoo.cfg on the local filesystem. I suggest generalising that to a general 
URI. One nice property is that it then does not tie a cluster to a particular 
machine, as the URI provides a level of indirection.

It is then the cluster administrator's responsibility to keep this URI 
up-to-date (although of course this should be automated), possibly via a client 
that just pulls membership information from the cluster periodically. As I said 
earlier, it's only important for the contents of this list to have one node in 
common with the true membership of the cluster, so it's allowed to get a bit 
out of sync. We can certainly easily imagine ways that ZK can help here. Of 
course the URI must be highly available, but it also has to exist, otherwise we 
could have 'orphaned' clusters that are running on machines whose identity we 
don't necessarily know. The URI can be a front for almost any scheme we like - 
periodic heartbeating of live nodes is one. 

The format of this file can be anything at all - from a serialised snapshot to 
a list of ip:port pairs, as long as it contains enough information for a client 
to find the cluster. Personally I would prefer human readable, simple formats.

To talk about recovery for a moment: when a node recovers from a crash and 
rejoins the cluster, it can help the cluster elect a master if the cluster is 
current non-quorate. This is because it was originally part of the cluster, and 
therefore the protocol guarantees that a quorum of nodes including the 
recovering one will have seen all committed proposals (this is important to 
correctness).

If the node was not originally a member of the cluster, it must not help get a 
master elected as it cannot be part of a quorum. Similarly, a node cannot query 
the cluster to find out if it was originally a member because the quorum 
required to do so might not exist. Therefore every node that ever successfully 
joins a cluster must store this fact in its own persistent storage, as only it 
can know whether it is permitted to help run the election. 

Finally, the startup problem. Given a URI, nodes can bootstrap themselves onto 
a cluster simply by being told to start in startup mode. Alternatively, a 
single node can be distinguished (again, in the URI contents perhaps) which 
will start in single-node mode and process join requests one-by-one. 


> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-107
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Patrick Hunt
>
> Currently cluster membership is statically defined, adding/removing hosts 
> to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to