I upgraded my zookeeper cluster last week from 3.2.1 to 3.3.1, in an attempt to
get away from a client bug that was crashing my backend services.
Unfortunately, this morning I had a server crash, and it brought down my entire
cluster. I don't have the logs leading up to the crash, because --
argghffbuggle -- log4j wasn't set up correctly. But I restarted all three
nodes, and odes two and three came back up and formed a quorum.
Node one, meanwhile, does this:
2010-06-02 17:04:56,446 - INFO
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@620] - LOOKING
2010-06-02 17:04:56,446 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:files...@82]
- Reading snapshot
/services/zookeeper/data/zookeeper/version-2/snapshot.a00000045
2010-06-02 17:04:56,476 - INFO
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@649] - New election. My id
= 1, Proposed zxid = 47244640287
2010-06-02 17:04:56,486 - INFO
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@689] - Notification: 1,
47244640287, 4, 1, LOOKING, LOOKING, 1
2010-06-02 17:04:56,486 - INFO
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@799] - Notification: 3,
38654707048, 3, 1, LOOKING, LEADING, 3
2010-06-02 17:04:56,486 - INFO
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@799] - Notification: 3,
38654707048, 3, 1, LOOKING, FOLLOWING, 2
2010-06-02 17:04:56,486 - INFO
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@642] - FOLLOWING
2010-06-02 17:04:56,486 - INFO
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:zookeeperser...@151] - Created server with
tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir
/services/zookeeper/data/zookeeper/version-2 snapdir
/services/zookeeper/data/zookeeper/version-2
2010-06-02 17:04:56,486 - FATAL [QuorumPeer:/0:0:0:0:0:0:0:0:2181:follo...@71]
- Leader epoch a is less than our epoch b
2010-06-02 17:04:56,486 - WARN [QuorumPeer:/0:0:0:0:0:0:0:0:2181:follo...@82]
- Exception when following the leader
java.io.IOException: Error: Epoch of leader is lower
at
org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:73)
at
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:644)
2010-06-02 17:04:56,486 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:follo...@166]
- shutdown called
java.lang.Exception: shutdown Follower
at
org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
at
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:648)
All I can find is this,
http://www.mail-archive.com/zookeeper-comm...@hadoop.apache.org/msg00449.html,
which implies that this state should never happen.
Any suggestions? If it happens again, I'll just have to roll everything back
to 3.2.1 and live with the client crashes.