Yes, thats what I was planning to do. At the follower, start FLE if the
follower does not receive a ping for > (syncLimit * tickTime).


On Wed, Nov 10, 2010 at 2:48 PM, Mahadev Konar <maha...@yahoo-inc.com>wrote:

> Hi Vishal,
>  There are periodic pings sent from the leader to the followers.
>
> Take a look at Leader.java:
>
> syncedSet.add(self.getId());
>                synchronized (learners) {
>                    for (LearnerHandler f : learners) {
>                        if (f.synced()) {
>                            syncedCount++;
>                            syncedSet.add(f.getSid());
>                        }
>                        f.ping();
>                    }
>                }
>
>
> This code sends periodic pings to the followers to make sure they are
> running fine. We should keep track of these pings and see if we havent seen
> a ping packet from the leader for a long time and give up following the
> leader in case we havent heard from him for a long time. This is definitely
> worth fixing since we pride ourselves in being a highly available and
> reliable service.
>
> Please feel free to open a jira and work on it.
> 3.4 would be a good target for this.
>
> Thanks
> mahadev
>
> On 11/10/10 12:26 PM, "Vishal Kher" <vishalm...@gmail.com> wrote:
>
> > Hi,
> >
> > In Follower.followLeader() after syncing with the leader, the follower
> does:
> >                 while (self.isRunning()) {
> >                     readPacket(qp);
> >                     processPacket(qp);
> >                 }
> >
> > It looks like it relies on socket timeout expiry to figure out if the
> > connection with the leader has gone down.  So a follower *with no
> cilents*
> > may never notice a faulty leader if a Leader has a software hang, but the
> > TCP connections with the peers are still valid. Since it has not cilents,
> it
> > won't hearbeat with the Leader. If majority of followers are not
> connected
> > to any clients, then even if other followers attempt to elect a new
> leader
> > after detecting that the leader is unresponsive.
> >
> > Please correct me if I am wrong. If I am not mistaken, should we add code
> at
> > the follower to monitor the heartbeat messages that it receives from the
> > leader and take action if it misses heartbeats for time > (syncLimit *
> > tickTime)? This certainly is a hypothetical case, however, I think it is
> > worth a fix.
> >
> > Thanks.
> > -Vishal
> >
>
>

Reply via email to