Follower should stop following and start FLE if it does not receive pings from 
the leader
-----------------------------------------------------------------------------------------

                 Key: ZOOKEEPER-928
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-928
             Project: Zookeeper
          Issue Type: Bug
    Affects Versions: 3.3.2
            Reporter: Vishal K
             Fix For: 3.4.0


In Follower.followLeader() after syncing with the leader, the follower does:
                while (self.isRunning()) {
                    readPacket(qp);
                    processPacket(qp);
                }

It looks like it relies on socket timeout expiry to figure out if the 
connection with the leader has gone down.  So a follower *with no cilents* may 
never notice a faulty leader if a Leader has a software hang, but the TCP 
connections with the peers are still valid. Since it has no cilents, it won't 
hearbeat with the Leader. If majority of followers are not connected to any 
clients, then FLE will fail even if other followers attempt to elect a new 
leader.

We should keep track of pings received from the leader and see if we havent seen
a ping packet from the leader for (syncLimit * tickTime) time and give up 
following the
leader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to