[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888684#action_12888684
 ] 

Travis Crawford commented on ZOOKEEPER-335:
-------------------------------------------

Unfortunately I still observed the "Leader epoch" issue and needed to manually 
force a leader election for the cluster to recover. This test was performed 
with the following base+patches, applied in the order listed.

Zookeeper 3.3.1
ZOOKEEPER-744
ZOOKEEPER-790


{code}
2010-07-15 02:43:57,181 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:files...@82] 
- Reading snapshot /data/zookeeper/version-2/snapshot.2300001ac2
2010-07-15 02:43:57,384 - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@649] - New election. My id 
=  1, Proposed zxid = 154618826848
2010-07-15 02:43:57,385 - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@689] - Notification: 1, 
154618826848, 4, 1, LOOKING, LOOKING, 1
2010-07-15 02:43:57,385 - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@799] - Notification: 2, 
146030952153, 3, 1, LOOKING, LEADING, 2
2010-07-15 02:43:57,385 - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@799] - Notification: 2, 
146030952153, 3, 1, LOOKING, FOLLOWING, 3
2010-07-15 02:43:57,385 - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@642] - FOLLOWING
2010-07-15 02:43:57,385 - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:zookeeperser...@151] - Created server with 
tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir 
/data/zookeeper/txlog/version-2 snapdir /data/zookeeper/version-2
2010-07-15 02:43:57,387 - FATAL [QuorumPeer:/0:0:0:0:0:0:0:0:2181:follo...@71] 
- Leader epoch 23 is less than our epoch 24
2010-07-15 02:43:57,387 - WARN  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:follo...@82] 
- Exception when following the leader 
java.io.IOException: Error: Epoch of leader is lower
        at 
org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:73)
        at 
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:644)
2010-07-15 02:43:57,387 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:follo...@166] 
- shutdown called 
java.lang.Exception: shutdown Follower
        at 
org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
        at 
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:648)
{code}


I followed the recipe @vishal provided for recreating.

(a) Stop one follower in a three node cluster
(b) Get some tea while it falls behind
(c) Start the node stopped in (a).


These timestamps show where the follower was stopped. It also shows when it was 
turned back on.

{code}
2010-07-15 02:35:36,398 - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:nioserverc...@1661] - Established session 
0x229aa13cfc6276b with negotiated timeout 10000 for client /10.209.45.114:34562
2010-07-15 02:39:18,907 - INFO  [main:quorumpeercon...@90] - Reading 
configuration from: /etc/zookeeper/conf/zoo.cfg
{code}


This timestamp is the first ``Leader epoch`` line. Everything between these two 
points will be the interesting bits.

{code}
2010-07-15 02:39:43,339 - FATAL [QuorumPeer:/0:0:0:0:0:0:0:0:2181:follo...@71] 
- Leader epoch 23 is less than our epoch 24
{code}

> zookeeper servers should commit the new leader txn to their logs.
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-335
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-335
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.1.0
>            Reporter: Mahadev konar
>            Assignee: Mahadev konar
>            Priority: Blocker
>             Fix For: 3.4.0
>
>         Attachments: faultynode-vishal.txt, zk.log.gz, zklogs.tar.gz
>
>
> currently the zookeeper followers do not commit the new leader election. This 
> will cause problems in a failure scenarios with a follower acking to the same 
> leader txn id twice, which might be two different intermittent leaders and 
> allowing them to propose two different txn's of the same zxid.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to