Hi Vishal,
 There are periodic pings sent from the leader to the followers.

Take a look at Leader.java:

                synchronized (learners) {
                    for (LearnerHandler f : learners) {
                        if (f.synced()) {

This code sends periodic pings to the followers to make sure they are
running fine. We should keep track of these pings and see if we havent seen
a ping packet from the leader for a long time and give up following the
leader in case we havent heard from him for a long time. This is definitely
worth fixing since we pride ourselves in being a highly available and
reliable service.

Please feel free to open a jira and work on it.
3.4 would be a good target for this.


On 11/10/10 12:26 PM, "Vishal Kher" <vishalm...@gmail.com> wrote:

> Hi,
> In Follower.followLeader() after syncing with the leader, the follower does:
>                 while (self.isRunning()) {
>                     readPacket(qp);
>                     processPacket(qp);
>                 }
> It looks like it relies on socket timeout expiry to figure out if the
> connection with the leader has gone down.  So a follower *with no cilents*
> may never notice a faulty leader if a Leader has a software hang, but the
> TCP connections with the peers are still valid. Since it has not cilents, it
> won't hearbeat with the Leader. If majority of followers are not connected
> to any clients, then even if other followers attempt to elect a new leader
> after detecting that the leader is unresponsive.
> Please correct me if I am wrong. If I am not mistaken, should we add code at
> the follower to monitor the heartbeat messages that it receives from the
> leader and take action if it misses heartbeats for time > (syncLimit *
> tickTime)? This certainly is a hypothetical case, however, I think it is
> worth a fix.
> Thanks.
> -Vishal

Reply via email to