yes, by replacing x at a time from 2x+1 you have quorum intersection. i have one more question - zookeeper itself doesn't assume perfect failure detection, which your scheme requires. what if the VM didn't actually fail but just slow and then tries to reconnect ?
On Thu, Mar 15, 2012 at 2:50 AM, Christian Ziech <[email protected]> wrote: > I don't think that we could be running into a split brain problem in our use > case. > Let me try to describe the scenario we are worried about (assuming an > ensemble of 5 nodes A,B,C,D,E): > - The ensemble is up and running and in sync > - Node A with the host name "zookeeperA.whatever-domain.priv" goes down > because the VM has gone away > - That removal of the VM is detected and a new VM is spawned with the same > host name "zookeeperA.whatever-domain.priv" - let's call that node A' > - Node A' zookeeper wants to join the cluster - right now this gets rejected > by the others since A' has a different IP address than A (and the old one is > "cached" in the InetSocketAddress of the QuorumPeer instance) > > We could ensure that at any given time there is only at most one node with > host name "zookeeperA.whatever-domain.priv" known by the ensemble and that > once one node is replaced, it would not come back. Also we could make sure > that our ensemble is big enough to compensate for a replacement of more than > x nodes at a time (setting it to x*2 + 1 nodes). > > So if I did not misestimate our problem it should be (due to the > restrictions) simpler than the problem to be solved by zookeeper-107. My > intention is basically by solving this smaller discrete problem to not need > to wait for that zookeeper-107 makes it into a release (the assumption is > that a smaller fix has a possibly a chance to make it into the 3.4.x branch > even). > > Am 15.03.2012 07:46, schrieb ext Alexander Shraer: >> >> Hi Christian, >> >> ZK-107 would indeed allow you to add/remove servers and change their >> addresses. >> >> > We could ensure that we always have a more or less fixed quorum of >> > zookeeper servers with a fixed set of host names. >> >> You should probably also ensure that a majority of the old ensemble >> intersects with a majority of the new one. >> Otherwise you have to run a reconfiguration protocol similarly to ZK-107. >> For example, if you have 3 servers A B and C, and now you're adding D and E >> that replace B and C, how would this work ? it is probable that D and E >> don't have the latest state (as you mention) and A is down or doesn't have >> the latest state too (a minority might not have the latest state). Also, how >> do you prevent split brain in this case ? meaning B and C thinking that they >> are still operational ? perhaps I'm missing something but I suspect that the >> change you propose won't be enough... >> >> Best Regards, >> Alex >> >> >> On Wed, Mar 14, 2012 at 10:01 AM, Christian Ziech >> <[email protected] <mailto:[email protected]>> wrote: >> >> Just a small addition: In my opinion the patch could really boil >> down to add a >> >> quorumServer.electionAddr = new >> InetSocketAddress(electionAddr.getHostName(), >> electionAddr.getPort()); >> >> in the catch(IOException e) clause of the connectOne() method of >> the QuorumCnxManager. In addition on should perhaps make the >> electionAddr field in the QuorumPeer.QuorumServer class volatile >> to prevent races. >> >> I haven't checked this change yet fully for implications but doing >> a quick test on some machines at least showed it would solve our >> use case. What do the more expert users / maintainers think - is >> it even worthwhile to go that route? >> >> Am 14.03.2012 17:04, schrieb ext Christian Ziech: >> >> LEt me describe our upcoming use case in a few words: We are >> planning to use zookeeper in a cloud were typically nodes come >> and go unpredictably. We could ensure that we always have a >> more or less fixed quorum of zookeeper servers with a fixed >> set of host names. However the IPs associated with the host >> names would change every time a new server comes up. I browsed >> the code a little and it seems right now that the only problem >> is that the zookeeper server is remembering the resolved >> InetSocketAddress in its QuorumPeer hash map. >> >> I saw that possibly ZOOKEEPER-107 would also solve that >> problem but possibly in a more generic way than actually >> needed (perhaps here I underestimate the impact of joining as >> a server with an empty data directory to replace a server that >> previously had one). >> >> Given that - from looking at ZOOKEEPER-107 - it seems that it >> will still take some time for the proposed fix to make it into >> a release, would it make sense to invest time into a smaller >> work fix just for this "replacing a dropped server without >> rolling restarts" use case? Would there be a chance that a fix >> for this makes it into the 3.4.x branch? >> >> Are there perhaps other ways to get this use case supported >> without the need for doing rolling restarts whenever we need >> to replace one of the zookeeper servers? >> >
