[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888688#action_12888688
 ] 

Travis Crawford commented on ZOOKEEPER-790:
-------------------------------------------

I accidentally posted this in ZOOKEEPER-335 -- reposting here. Sorry for the 
posting mixup -- the content is correct though :)


Unfortunately I still observed the "Leader epoch" issue and needed to manually 
force a leader election for the cluster to recover. This test was performed 
with the following base+patches, applied in the order listed.

Zookeeper 3.3.1
ZOOKEEPER-744
ZOOKEEPER-790

{code}
2010-07-15 02:43:57,181 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:files...@82] 
- Reading snapshot /data/zookeeper/version-2/snapshot.2300001ac2
2010-07-15 02:43:57,384 - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@649] - New election. My id 
=  1, Proposed zxid = 154618826848
2010-07-15 02:43:57,385 - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@689] - Notification: 1, 
154618826848, 4, 1, LOOKING, LOOKING, 1
2010-07-15 02:43:57,385 - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@799] - Notification: 2, 
146030952153, 3, 1, LOOKING, LEADING, 2
2010-07-15 02:43:57,385 - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@799] - Notification: 2, 
146030952153, 3, 1, LOOKING, FOLLOWING, 3
2010-07-15 02:43:57,385 - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@642] - FOLLOWING
2010-07-15 02:43:57,385 - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:zookeeperser...@151] - Created server with 
tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir 
/data/zookeeper/txlog/version-2 snapdir /data/zookeeper/version-2
2010-07-15 02:43:57,387 - FATAL [QuorumPeer:/0:0:0:0:0:0:0:0:2181:follo...@71] 
- Leader epoch 23 is less than our epoch 24
2010-07-15 02:43:57,387 - WARN  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:follo...@82] 
- Exception when following the leader 
java.io.IOException: Error: Epoch of leader is lower
        at 
org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:73)
        at 
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:644)
2010-07-15 02:43:57,387 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:follo...@166] 
- shutdown called 
java.lang.Exception: shutdown Follower
        at 
org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
        at 
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:648)
{code}


I followed the recipe @vishal provided for recreating.

(a) Stop one follower in a three node cluster
(b) Get some tea while it falls behind
(c) Start the node stopped in (a).

These timestamps show where the follower was stopped. It also shows when it was 
turned back on.

{code}
2010-07-15 02:35:36,398 - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:nioserverc...@1661] - Established session 
0x229aa13cfc6276b with negotiated timeout 10000 for client /10.209.45.114:34562
2010-07-15 02:39:18,907 - INFO  [main:quorumpeercon...@90] - Reading 
configuration from: /etc/zookeeper/conf/zoo.cfg
{code}


This timestamp is the first ``Leader epoch`` line. Everything between these two 
points will be the interesting bits.


{code}
2010-07-15 02:39:43,339 - FATAL [QuorumPeer:/0:0:0:0:0:0:0:0:2181:follo...@71] 
- Leader epoch 23 is less than our epoch 24
{code}

> Last processed zxid set prematurely while establishing leadership
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-790
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-790
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.1
>            Reporter: Flavio Paiva Junqueira
>            Assignee: Flavio Paiva Junqueira
>            Priority: Blocker
>             Fix For: 3.3.2, 3.4.0
>
>         Attachments: ZOOKEEPER-790.patch, ZOOKEEPER-790.travis.log.bz2
>
>
> The leader code is setting the last processed zxid to the first of the new 
> epoch even before connecting to a quorum of followers. Because the leader 
> code sets this value before connecting to a quorum of followers 
> (Leader.java:281) and the follower code throws an IOException 
> (Follower.java:73) if the leader epoch is smaller, we have that when the 
> false leader drops leadership and becomes a follower, it finds a smaller 
> epoch and kills itself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to