LEt me describe our upcoming use case in a few words: We are planning to
use zookeeper in a cloud were typically nodes come and go unpredictably.
We could ensure that we always have a more or less fixed quorum of
zookeeper servers with a fixed set of host names. However the IPs
associated with the host names would change every time a new server
comes up. I browsed the code a little and it seems right now that the
only problem is that the zookeeper server is remembering the resolved
InetSocketAddress in its QuorumPeer hash map.
I saw that possibly ZOOKEEPER-107 would also solve that problem but
possibly in a more generic way than actually needed (perhaps here I
underestimate the impact of joining as a server with an empty data
directory to replace a server that previously had one).
Given that - from looking at ZOOKEEPER-107 - it seems that it will still
take some time for the proposed fix to make it into a release, would it
make sense to invest time into a smaller work fix just for this
"replacing a dropped server without rolling restarts" use case? Would
there be a chance that a fix for this makes it into the 3.4.x branch?
Are there perhaps other ways to get this use case supported without the
need for doing rolling restarts whenever we need to replace one of the
zookeeper servers?
- Zookeeper on short lived VMs and ZOOKEEPER... Christian Ziech
-