[
https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12706219#action_12706219
]
Henry Robinson commented on ZOOKEEPER-107:
------------------------------------------
I agree with pretty much everything I've read here, (in particular, the
importance of getting consensus!), but wanted to clarify my initial comment a
bit.
Rather than choose between strategies 1 and 2 as outlined by Benjamin, I think
there's a hybrid approach needed.
If a node is a member of a quorate cluster, then the most up to date membership
information should be available to it in a znode. I think this is the most
elegant approach, and is trivially achieved by pushing join/leave requests
through the atomic broadcast pipeline.
If a node is joining the cluster, it needs to be able to bootstrap the location
of the cluster from somewhere. There therefore needs to be a externally
available resource containing a list of machines in the cluster that is at
least accurate for one machine (as a joining node will try all servers in that
list in turn). When I say available at some URI, this is what I mean.
Currently, this information is kept statically at a URI that addresses
conf/zoo.cfg on the local filesystem. I suggest generalising that to a general
URI. One nice property is that it then does not tie a cluster to a particular
machine, as the URI provides a level of indirection.
It is then the cluster administrator's responsibility to keep this URI
up-to-date (although of course this should be automated), possibly via a client
that just pulls membership information from the cluster periodically. As I said
earlier, it's only important for the contents of this list to have one node in
common with the true membership of the cluster, so it's allowed to get a bit
out of sync. We can certainly easily imagine ways that ZK can help here. Of
course the URI must be highly available, but it also has to exist, otherwise we
could have 'orphaned' clusters that are running on machines whose identity we
don't necessarily know. The URI can be a front for almost any scheme we like -
periodic heartbeating of live nodes is one.
The format of this file can be anything at all - from a serialised snapshot to
a list of ip:port pairs, as long as it contains enough information for a client
to find the cluster. Personally I would prefer human readable, simple formats.
To talk about recovery for a moment: when a node recovers from a crash and
rejoins the cluster, it can help the cluster elect a master if the cluster is
current non-quorate. This is because it was originally part of the cluster, and
therefore the protocol guarantees that a quorum of nodes including the
recovering one will have seen all committed proposals (this is important to
correctness).
If the node was not originally a member of the cluster, it must not help get a
master elected as it cannot be part of a quorum. Similarly, a node cannot query
the cluster to find out if it was originally a member because the quorum
required to do so might not exist. Therefore every node that ever successfully
joins a cluster must store this fact in its own persistent storage, as only it
can know whether it is permitted to help run the election.
Finally, the startup problem. Given a URI, nodes can bootstrap themselves onto
a cluster simply by being told to start in startup mode. Alternatively, a
single node can be distinguished (again, in the URI contents perhaps) which
will start in single-node mode and process join requests one-by-one.
> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
> Key: ZOOKEEPER-107
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
> Project: Zookeeper
> Issue Type: Improvement
> Components: server
> Reporter: Patrick Hunt
>
> Currently cluster membership is statically defined, adding/removing hosts
> to/from the server cluster dynamically needs to be supported.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.