Hi,

In Follower.followLeader() after syncing with the leader, the follower does:
                while (self.isRunning()) {
                    readPacket(qp);
                    processPacket(qp);
                }

It looks like it relies on socket timeout expiry to figure out if the
connection with the leader has gone down.  So a follower *with no cilents*
may never notice a faulty leader if a Leader has a software hang, but the
TCP connections with the peers are still valid. Since it has not cilents, it
won't hearbeat with the Leader. If majority of followers are not connected
to any clients, then even if other followers attempt to elect a new leader
after detecting that the leader is unresponsive.

Please correct me if I am wrong. If I am not mistaken, should we add code at
the follower to monitor the heartbeat messages that it receives from the
leader and take action if it misses heartbeats for time > (syncLimit *
tickTime)? This certainly is a hypothetical case, however, I think it is
worth a fix.

Thanks.
-Vishal

Reply via email to