We are currently testing out 3.5.0. If the fix made it into 3.4.4, I assume that issue is also fixed in 3.5.0?
~Jared On Thu, Jul 26, 2012 at 11:11 PM, Hanno Schlichting <[email protected]>wrote: > What version of ZK are yoy using? There's a bug in 3.4.x with 5 node > clusters failing to agree on a leader. That's only solved in the yet > unreleased 3.4.4. > > Hanno > > On 27.07.2012, at 06:40, Jared Cantwell <[email protected]> wrote: > > > I have a 5 node cluster configured using dynamic zookeeper. It has been > > through several reconfigurations, but at the moment I am simply trying to > > start 3 of the nodes to get ZK accessible. I have confirmed that the > myid > > files match the entries in the dynamic membership file for the 3 nodes in > > question. However, when I start up the three nodes I get the following > > error: > > > > 2012-07-26 22:26:01,037 [myid:8] - INFO [QuorumPeer[myid=8]/ > 10.10.5.27:2181 > > :Leader@445] - LEADING - LEADER ELECTION TOOK - 13 > > 2012-07-26 22:26:01,039 [myid:8] - INFO [QuorumPeer[myid=8]/ > 10.10.5.27:2181 > > :FileSnap@83] - Reading snapshot /sf/data/zookeeper/ > > 10.10.5.27/version-2/snapshot.3000001e3 > > 2012-07-26 22:26:01,065 [myid:8] - INFO [QuorumPeer[myid=8]/ > 10.10.5.27:2181 > > :FileTxnSnapLog@270] - Snapshotting: 0x3000001e3 to /sf/data/zookeeper/ > > 10.10.5.27/version-2/snapshot.3000001e3 > > 2012-07-26 22:26:10,837 [myid:8] - INFO > > [WorkerReceiver[myid=8]:FastLeaderElection@635] - Notification: 8 > > (n.leader), 0x3000001e3 (n.zxid), 0x1 (n.round), LOOKING (n.state), 9 > > (n.sid), 0x3 (n.peerEPoch), LEADING (my state)300000147 (n.config > version) > > 2012-07-26 22:26:20,849 [myid:8] - INFO > > [WorkerReceiver[myid=8]:FastLeaderElection@635] - Notification: 8 > > (n.leader), 0x3000001e3 (n.zxid), 0x1 (n.round), LOOKING (n.state), 9 > > (n.sid), 0x3 (n.peerEPoch), LEADING (my state)300000147 (n.config > version) > > 2012-07-26 22:26:21,083 [myid:8] - WARN [QuorumPeer[myid=8]/ > 10.10.5.27:2181 > > :QuorumPeer@949] - Unexpected exception > > java.lang.InterruptedException: *Timeout while waiting for epoch from > quorum > > * > > at > > > org.apache.zookeeper.server.quorum.Leader.getEpochToPropose(Leader.java:1207) > > at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:464) > > at > > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:946) > > 2012-07-26 22:26:21,083 [myid:8] - INFO [QuorumPeer[myid=8]/ > 10.10.5.27:2181 > > :Leader@614] - Shutting down > > 2012-07-26 22:26:21,083 [myid:8] - INFO [QuorumPeer[myid=8]/ > 10.10.5.27:2181 > > :Leader@620] - Shutdown called > > java.lang.Exception: shutdown Leader! reason: Forcing shutdown > > at > > org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:620) > > at > > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:952) > > 2012-07-26 22:26:21,084 [myid:8] - INFO [QuorumPeer[myid=8]/ > 10.10.5.27:2181 > > :ZooKeeperServer@413] - shutting down > > 2012-07-26 22:26:21,084 [myid:8] - INFO > > [LearnerCnxAcceptor-0.0.0.0/0.0.0.0:2182:Leader$LearnerCnxAcceptor@407] > - > > exception while shutting down acceptor: java.net.SocketException: Socket > > closed > > > > I am not sure what to make of it or how to debug from here. Any pointers > > or suggestions on how to debug what might be wrong, or simply some usual > > causes of this error would be appreciated. > > > > Thanks! > > Jared >
