you have a problem on the client side. unfortunately, clients only resolve the hostnames when they open a zookeeper handle. so that is one problem, see ZOOKEEPER-338. (it would be really nice to fix that one :) it's not hard.) I think the servers also have a similar problem when resolving the addresses of their peers.
ben On Wed, May 11, 2011 at 12:53 PM, tsuna <[email protected]> wrote: > Hi all, > I have a 5-node quorum with 1 node per rack. It's unfortunate but it > so happens that I need to free up these 5 racks and move all their > machines somewhere else. This requires me to change the IP addresses > of all the machines in these 5 racks as they land in other racks. > > All our applications are configured with a quorum specification of > "zookeeper". This ends up being resolved as > zookeeper.datacenter.internaldomain.com, which returns 5 A records. > All our Java applications are also started with > -Dsun.net.inetaddr.ttl=600, just in case that matters, because the JDK > by default has the wonderful idea to cache all DNS names forever > regardless of the TTL. So with this flag the JDK will cache DNS > entries for 10 minutes maximum. > > I reviewed the FAQ (http://wiki.apache.org/hadoop/ZooKeeper/FAQ – BTW > there's a broken image in answer to the 1st question) and the admin > guide (http://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html) and > searched on the intertubes a bit, but I couldn't come up with an > answer. > > Can I move the ZK machines one by one and re-IP them and hope they'll > rejoin the quorum gracefully? If I allow like 20-30 minutes between > each machine move, we should always have a quorum, provided that the > machines can re-join the quorum with a different IP. Is that the > case? Has anyone tested this? > > If that doesn't work, can you think of anything I can do to gracefully > move the quorum? > > Thanks. > > -- > Benoit "tsuna" Sigoure > Software Engineer @ www.StumbleUpon.com >
