Henry Robinson updated ZOOKEEPER-690:

    Attachment: ZOOKEEPER-690.patch

I have found what I hope is the problem.

Because QuorumPeers duplicate their 'LearnerType' in two places there's the 
possibility that they may get out of sync. This is what was happening here - it 
was a test bug. Although the Observers knew that they were Observers, the other 
nodes did not. This affected the leader election protocol as other node did not 
know to reject an Observer.

I feel like we should refactor the QuorumPeer.QuorumServer code so as not to 
duplicate information, but for the time being I think this patch will work. 

I have also taken the opportunity to standardise the naming of 'learnertype' 
throughout the code (in some places it was called 'peertype' adding to the 

Tests pass on my machine, but I can't guarantee that the problem is fixed as I 
could never recreate the error.

Thanks to Flavio for catching the broken invariant!

> AsyncTestHammer test fails on hudson.
> -------------------------------------
>                 Key: ZOOKEEPER-690
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-690
>             Project: Zookeeper
>          Issue Type: Bug
>            Reporter: Mahadev konar
>            Assignee: Henry Robinson
>            Priority: Blocker
>             Fix For: 3.3.1, 3.4.0
>         Attachments: jstack-201004201053.txt, nohup-201004201053.txt, 
> TEST-org.apache.zookeeper.test.AsyncHammerTest.txt, zoo.log, 
> ZOOKEEPER-690.patch
> the hudson test failed on 
> http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/2/testReport/.
>  There are huge set of cancelledkeyexceptions in the logs. Still going 
> through the logs to find out the reason for failure.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to