Hi, In Follower.followLeader() after syncing with the leader, the follower does: while (self.isRunning()) { readPacket(qp); processPacket(qp); }
It looks like it relies on socket timeout expiry to figure out if the connection with the leader has gone down. So a follower *with no cilents* may never notice a faulty leader if a Leader has a software hang, but the TCP connections with the peers are still valid. Since it has not cilents, it won't hearbeat with the Leader. If majority of followers are not connected to any clients, then even if other followers attempt to elect a new leader after detecting that the leader is unresponsive. Please correct me if I am wrong. If I am not mistaken, should we add code at the follower to monitor the heartbeat messages that it receives from the leader and take action if it misses heartbeats for time > (syncLimit * tickTime)? This certainly is a hypothetical case, however, I think it is worth a fix. Thanks. -Vishal