Hi Charity, unfortunately this is a known issue not specific to 3.3 that we are working to address. See this thread for some background:

http://zookeeper-user.578899.n2.nabble.com/odd-error-message-td4933761.html

I've raised the JIRA level to "blocker" to ensure we address this asap.

As Ted suggested you can remove the datadir -- only on the effected server -- and then restart it. That should resolve the issue (the server will d/l a snapshot of the current db from the leader).

Patrick

On 06/02/2010 11:11 AM, Charity Majors wrote:
I upgraded my zookeeper cluster last week from 3.2.1 to 3.3.1, in an attempt to 
get away from a client bug that was crashing my backend services.

Unfortunately, this morning I had a server crash, and it brought down my entire 
cluster.  I don't have the logs leading up to the crash, because -- 
argghffbuggle -- log4j wasn't set up correctly.  But I restarted all three 
nodes, and odes two and three came back up and formed a quorum.

Node one, meanwhile, does this:

2010-06-02 17:04:56,446 - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@620] - LOOKING
2010-06-02 17:04:56,446 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:files...@82] 
- Reading snapshot 
/services/zookeeper/data/zookeeper/version-2/snapshot.a00000045
2010-06-02 17:04:56,476 - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@649] - New election. My id 
=  1, Proposed zxid = 47244640287
2010-06-02 17:04:56,486 - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@689] - Notification: 1, 
47244640287, 4, 1, LOOKING, LOOKING, 1
2010-06-02 17:04:56,486 - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@799] - Notification: 3, 
38654707048, 3, 1, LOOKING, LEADING, 3
2010-06-02 17:04:56,486 - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@799] - Notification: 3, 
38654707048, 3, 1, LOOKING, FOLLOWING, 2
2010-06-02 17:04:56,486 - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@642] - FOLLOWING
2010-06-02 17:04:56,486 - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:zookeeperser...@151] - Created server with 
tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir 
/services/zookeeper/data/zookeeper/version-2 snapdir 
/services/zookeeper/data/zookeeper/version-2
2010-06-02 17:04:56,486 - FATAL [QuorumPeer:/0:0:0:0:0:0:0:0:2181:follo...@71] 
- Leader epoch a is less than our epoch b
2010-06-02 17:04:56,486 - WARN  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:follo...@82] 
- Exception when following the leader
java.io.IOException: Error: Epoch of leader is lower
        at 
org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:73)
        at 
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:644)
2010-06-02 17:04:56,486 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:follo...@166] 
- shutdown called
java.lang.Exception: shutdown Follower
        at 
org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
        at 
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:648)



All I can find is this, 
http://www.mail-archive.com/zookeeper-comm...@hadoop.apache.org/msg00449.html, 
which implies that this state should never happen.

Any suggestions?  If it happens again, I'll just have to roll everything back 
to 3.2.1 and live with the client crashes.




Reply via email to