It looks like I had a firewall issue with the leader election port on one of the nodes. Sorry for the spam.
~Jared On Thu, Jul 26, 2012 at 11:20 PM, Jared Cantwell <[email protected]>wrote: > We are currently testing out 3.5.0. If the fix made it into 3.4.4, I > assume that issue is also fixed in 3.5.0? > > ~Jared > > > On Thu, Jul 26, 2012 at 11:11 PM, Hanno Schlichting <[email protected]>wrote: > >> What version of ZK are yoy using? There's a bug in 3.4.x with 5 node >> clusters failing to agree on a leader. That's only solved in the yet >> unreleased 3.4.4. >> >> Hanno >> >> On 27.07.2012, at 06:40, Jared Cantwell <[email protected]> wrote: >> >> > I have a 5 node cluster configured using dynamic zookeeper. It has been >> > through several reconfigurations, but at the moment I am simply trying >> to >> > start 3 of the nodes to get ZK accessible. I have confirmed that the >> myid >> > files match the entries in the dynamic membership file for the 3 nodes >> in >> > question. However, when I start up the three nodes I get the following >> > error: >> > >> > 2012-07-26 22:26:01,037 [myid:8] - INFO [QuorumPeer[myid=8]/ >> 10.10.5.27:2181 >> > :Leader@445] - LEADING - LEADER ELECTION TOOK - 13 >> > 2012-07-26 22:26:01,039 [myid:8] - INFO [QuorumPeer[myid=8]/ >> 10.10.5.27:2181 >> > :FileSnap@83] - Reading snapshot /sf/data/zookeeper/ >> > 10.10.5.27/version-2/snapshot.3000001e3 >> > 2012-07-26 22:26:01,065 [myid:8] - INFO [QuorumPeer[myid=8]/ >> 10.10.5.27:2181 >> > :FileTxnSnapLog@270] - Snapshotting: 0x3000001e3 to /sf/data/zookeeper/ >> > 10.10.5.27/version-2/snapshot.3000001e3 >> > 2012-07-26 22:26:10,837 [myid:8] - INFO >> > [WorkerReceiver[myid=8]:FastLeaderElection@635] - Notification: 8 >> > (n.leader), 0x3000001e3 (n.zxid), 0x1 (n.round), LOOKING (n.state), 9 >> > (n.sid), 0x3 (n.peerEPoch), LEADING (my state)300000147 (n.config >> version) >> > 2012-07-26 22:26:20,849 [myid:8] - INFO >> > [WorkerReceiver[myid=8]:FastLeaderElection@635] - Notification: 8 >> > (n.leader), 0x3000001e3 (n.zxid), 0x1 (n.round), LOOKING (n.state), 9 >> > (n.sid), 0x3 (n.peerEPoch), LEADING (my state)300000147 (n.config >> version) >> > 2012-07-26 22:26:21,083 [myid:8] - WARN [QuorumPeer[myid=8]/ >> 10.10.5.27:2181 >> > :QuorumPeer@949] - Unexpected exception >> > java.lang.InterruptedException: *Timeout while waiting for epoch from >> quorum >> > * >> > at >> > >> org.apache.zookeeper.server.quorum.Leader.getEpochToPropose(Leader.java:1207) >> > at >> org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:464) >> > at >> > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:946) >> > 2012-07-26 22:26:21,083 [myid:8] - INFO [QuorumPeer[myid=8]/ >> 10.10.5.27:2181 >> > :Leader@614] - Shutting down >> > 2012-07-26 22:26:21,083 [myid:8] - INFO [QuorumPeer[myid=8]/ >> 10.10.5.27:2181 >> > :Leader@620] - Shutdown called >> > java.lang.Exception: shutdown Leader! reason: Forcing shutdown >> > at >> > org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:620) >> > at >> > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:952) >> > 2012-07-26 22:26:21,084 [myid:8] - INFO [QuorumPeer[myid=8]/ >> 10.10.5.27:2181 >> > :ZooKeeperServer@413] - shutting down >> > 2012-07-26 22:26:21,084 [myid:8] - INFO >> > [LearnerCnxAcceptor-0.0.0.0/0.0.0.0:2182:Leader$LearnerCnxAcceptor@407] >> - >> > exception while shutting down acceptor: java.net.SocketException: Socket >> > closed >> > >> > I am not sure what to make of it or how to debug from here. Any >> pointers >> > or suggestions on how to debug what might be wrong, or simply some usual >> > causes of this error would be appreciated. >> > >> > Thanks! >> > Jared >> > >
