Two possible race conditions during leader establishment
--------------------------------------------------------

                 Key: ZOOKEEPER-1194
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1194
             Project: ZooKeeper
          Issue Type: Bug
          Components: server
    Affects Versions: 3.4.0
            Reporter: Alexander Shraer
            Assignee: Alexander Shraer
             Fix For: 3.4.0


Leader.getEpochToPropose() and Leader.waitForNewEpoch() act as barriers - they 
make sure that a leader/follower can return from calling the method only once 
connectingFollowers (or electingFollowers) contain a quorum. But these methods 
don't make sure that the leader itself is in 
connectingFollowers/electingFollowers. So the leader didn't necessarily reach 
the barrier when followers pass it. This can cause the following problems:

1. If the leader is not in connectingFollowers when a LearnerHandler returns 
from getEpochToPropose(), then the epoch sent by the leader to the follower 
might be smaller than the leader's own last accepted epoch.

2. If the leader is not in electingFollowers when LearnerHandler returns from 
waitForNewEpoch() then the leader will send a NEWLEADER message to followers, 
and the followers will respond, but it is possible that the NEWLEADER message 
is not in outstandingProposals when these NEWLEADER  acks arrive, which will 
cause the NEWLEADER acks to be dropped.


To fix this I propose to explicitly check that the leader is in 
connectingFollowers/electingFollowers before anyone can pass these barriers.






--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to