What version of ZK are yoy using? There's a bug in 3.4.x with 5 node clusters failing to agree on a leader. That's only solved in the yet unreleased 3.4.4.
Hanno On 27.07.2012, at 06:40, Jared Cantwell <[email protected]> wrote: > I have a 5 node cluster configured using dynamic zookeeper. It has been > through several reconfigurations, but at the moment I am simply trying to > start 3 of the nodes to get ZK accessible. I have confirmed that the myid > files match the entries in the dynamic membership file for the 3 nodes in > question. However, when I start up the three nodes I get the following > error: > > 2012-07-26 22:26:01,037 [myid:8] - INFO [QuorumPeer[myid=8]/10.10.5.27:2181 > :Leader@445] - LEADING - LEADER ELECTION TOOK - 13 > 2012-07-26 22:26:01,039 [myid:8] - INFO [QuorumPeer[myid=8]/10.10.5.27:2181 > :FileSnap@83] - Reading snapshot /sf/data/zookeeper/ > 10.10.5.27/version-2/snapshot.3000001e3 > 2012-07-26 22:26:01,065 [myid:8] - INFO [QuorumPeer[myid=8]/10.10.5.27:2181 > :FileTxnSnapLog@270] - Snapshotting: 0x3000001e3 to /sf/data/zookeeper/ > 10.10.5.27/version-2/snapshot.3000001e3 > 2012-07-26 22:26:10,837 [myid:8] - INFO > [WorkerReceiver[myid=8]:FastLeaderElection@635] - Notification: 8 > (n.leader), 0x3000001e3 (n.zxid), 0x1 (n.round), LOOKING (n.state), 9 > (n.sid), 0x3 (n.peerEPoch), LEADING (my state)300000147 (n.config version) > 2012-07-26 22:26:20,849 [myid:8] - INFO > [WorkerReceiver[myid=8]:FastLeaderElection@635] - Notification: 8 > (n.leader), 0x3000001e3 (n.zxid), 0x1 (n.round), LOOKING (n.state), 9 > (n.sid), 0x3 (n.peerEPoch), LEADING (my state)300000147 (n.config version) > 2012-07-26 22:26:21,083 [myid:8] - WARN [QuorumPeer[myid=8]/10.10.5.27:2181 > :QuorumPeer@949] - Unexpected exception > java.lang.InterruptedException: *Timeout while waiting for epoch from quorum > * > at > org.apache.zookeeper.server.quorum.Leader.getEpochToPropose(Leader.java:1207) > at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:464) > at > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:946) > 2012-07-26 22:26:21,083 [myid:8] - INFO [QuorumPeer[myid=8]/10.10.5.27:2181 > :Leader@614] - Shutting down > 2012-07-26 22:26:21,083 [myid:8] - INFO [QuorumPeer[myid=8]/10.10.5.27:2181 > :Leader@620] - Shutdown called > java.lang.Exception: shutdown Leader! reason: Forcing shutdown > at > org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:620) > at > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:952) > 2012-07-26 22:26:21,084 [myid:8] - INFO [QuorumPeer[myid=8]/10.10.5.27:2181 > :ZooKeeperServer@413] - shutting down > 2012-07-26 22:26:21,084 [myid:8] - INFO > [LearnerCnxAcceptor-0.0.0.0/0.0.0.0:2182:Leader$LearnerCnxAcceptor@407] - > exception while shutting down acceptor: java.net.SocketException: Socket > closed > > I am not sure what to make of it or how to debug from here. Any pointers > or suggestions on how to debug what might be wrong, or simply some usual > causes of this error would be appreciated. > > Thanks! > Jared
