Hello,

We recently upgraded our 5 node ZooKeeper ensemble from v3.4.8 to v3.5.6.
Encountered no issues as such.

This is how the ZooKeeper config looks like:



Post upgrade, we had to migrate server.22 on the same node, but with
FOO.bar.com domain name due to kerberos referral issues. And, we used
different server-identifier, i.e., 23 when we migrated. So, here is how the
new config looked like:



We restarted all the nodes in the ensemble with the above updated config.
And the migrated node joined the quorum successfully and was serving all
clients directly connected to it, without any issues.

Recently, when a leader election happened,
server.23=node5.foo.bar.com(migrated node) was chosen as Leader (as it has
highest ID). But then, ZooKeeper was unable to serve any clients and all the
servers were somehow still trying to establish a channel to 22 (old DNS
name: node5.bar.com) and were throwing below error in a loop:



Fetching config from live ZooKeeper znode also doesn't show "22" being a
member of the ensemble. Its not clear how "22" is still coming into the
picture.



We suspected some weird caching issue and restarted ZooKeeper across all the
nodes but that didn't help. So, whenever node5 becomes the Leader, ID:22 is
popping up. We even rebooted node5 and that hasn't helped too.

We also looked at '/zookeeper/config' content from snapshot files and did
not find any reference to ID:22.

Any help would be greatly appreciated.

Thanks,
Rajkiran



--
Sent from: http://zookeeper-user.578899.n2.nabble.com/

Reply via email to