Mahadev konar commented on ZOOKEEPER-928:

 Here is the definition of setSoTimeout -

public void setSoTimeout(int timeout)
                  throws SocketException
Enable/disable SO_TIMEOUT with the specified timeout, in milliseconds. With 
this option set to a non-zero timeout, a read() call on the InputStream 
associated with this Socket will block for only this amount of time. If the 
timeout expires, a java.net.SocketTimeoutException is raised, though the Socket 
is still valid. The option must be enabled prior to entering the blocking 
operation to have effect. The timeout must be > 0. A timeout of zero is 
interpreted as an infinite timeout.

This means is that the read would block till timeout and throw an exception if 
it doesnt hear from the leader during that time. Wouldnt this suffice?

> Follower should stop following and start FLE if it does not receive pings 
> from the leader
> -----------------------------------------------------------------------------------------
>                 Key: ZOOKEEPER-928
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-928
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum, server
>    Affects Versions: 3.3.2
>            Reporter: Vishal K
>            Priority: Critical
>             Fix For: 3.3.3, 3.4.0
> In Follower.followLeader() after syncing with the leader, the follower does:
>                 while (self.isRunning()) {
>                     readPacket(qp);
>                     processPacket(qp);
>                 }
> It looks like it relies on socket timeout expiry to figure out if the 
> connection with the leader has gone down.  So a follower *with no cilents* 
> may never notice a faulty leader if a Leader has a software hang, but the TCP 
> connections with the peers are still valid. Since it has no cilents, it won't 
> hearbeat with the Leader. If majority of followers are not connected to any 
> clients, then FLE will fail even if other followers attempt to elect a new 
> leader.
> We should keep track of pings received from the leader and see if we havent 
> seen
> a ping packet from the leader for (syncLimit * tickTime) time and give up 
> following the
> leader.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to