Hi all, I have a 5-node quorum with 1 node per rack. It's unfortunate but it so happens that I need to free up these 5 racks and move all their machines somewhere else. This requires me to change the IP addresses of all the machines in these 5 racks as they land in other racks.
All our applications are configured with a quorum specification of "zookeeper". This ends up being resolved as zookeeper.datacenter.internaldomain.com, which returns 5 A records. All our Java applications are also started with -Dsun.net.inetaddr.ttl=600, just in case that matters, because the JDK by default has the wonderful idea to cache all DNS names forever regardless of the TTL. So with this flag the JDK will cache DNS entries for 10 minutes maximum. I reviewed the FAQ (http://wiki.apache.org/hadoop/ZooKeeper/FAQ – BTW there's a broken image in answer to the 1st question) and the admin guide (http://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html) and searched on the intertubes a bit, but I couldn't come up with an answer. Can I move the ZK machines one by one and re-IP them and hope they'll rejoin the quorum gracefully? If I allow like 20-30 minutes between each machine move, we should always have a quorum, provided that the machines can re-join the quorum with a different IP. Is that the case? Has anyone tested this? If that doesn't work, can you think of anything I can do to gracefully move the quorum? Thanks. -- Benoit "tsuna" Sigoure Software Engineer @ www.StumbleUpon.com
