Flavio Paiva Junqueira commented on ZOOKEEPER-790:

Hi Vishal, Thanks for all the information. I haven't been able to reproduce it 
yet, but here are some thoughts after looking over your logs again:

1- It is not a problem that server 0 is declaring itself leader, even though 
there is another leader running. Server 0 will be ignored by the others and 
eventually will drop its leadership as you have observed;
2- The notifications of 1 and 2 say looking because they have been queued at 
the time 1 and 2 were looking for a leader. That's not an issue;
3- I don't understand why the patch doesn't work. Let me tell you how I'm 
interpreting your run. Server 0 is receiving the notifications from 1 and 2, 
and deciding that it should be the leader. Because in the current trunk code we 
set the first zxid for the new epoch before hearing from a quorum, once server 
0 drops leadership, it has a higher zxid than everyone else. Consequently, it 
correctly refuses to talk to the current leader. Now, setting the first epoch 
zxid prematurely is a problem, and the patch I have uploaded should fix it. The 
bottom line is that I can't understand why the patch I uploaded does not fix 
it. Have you made sure to apply it before running your new tests? Either way, I 
would appreciate if you could upload logs out of a run with the current 790 


> Last processed zxid set prematurely while establishing leadership
> -----------------------------------------------------------------
>                 Key: ZOOKEEPER-790
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-790
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.1
>            Reporter: Flavio Paiva Junqueira
>            Assignee: Flavio Paiva Junqueira
>            Priority: Blocker
>             Fix For: 3.3.2, 3.4.0
>         Attachments: ZOOKEEPER-790.patch
> The leader code is setting the last processed zxid to the first of the new 
> epoch even before connecting to a quorum of followers. Because the leader 
> code sets this value before connecting to a quorum of followers 
> (Leader.java:281) and the follower code throws an IOException 
> (Follower.java:73) if the leader epoch is smaller, we have that when the 
> false leader drops leadership and becomes a follower, it finds a smaller 
> epoch and kills itself.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to