Stu Hood updated ZOOKEEPER-157:
This patch definitely resolves the nasty issue I was seeing yesterday that
generated the log I attached to #62 (when applied to r701567).
One problem I'm still noticing is that whenever I lose a follower, the leader
and remaining follower start eating up the entire CPU. I've attached configs
and stack traces.
> Peer can't find existing leader
> Key: ZOOKEEPER-157
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-157
> Project: Zookeeper
> Issue Type: Bug
> Affects Versions: 3.0.0
> Reporter: Flavio Paiva Junqueira
> Assignee: Flavio Paiva Junqueira
> Priority: Critical
> Attachments: dead-follower.tar.gz, ZOOKEEPER-157.patch
> In the patch of JIRA 127, I forgot to set the state of a peer when this peer
> is looking for a leader and it receives a message from the current leader. In
> this patch, I have fixed this problem, and also returned to what we had
> previously. With this current patch, when a peer joins and there is already a
> leader elected, the joining peer will only recognize the new leader as the
> leader once it receives a confirmation from a majority. The alternative is to
> set the leader once we receive a message from a peer claiming to be the
> leader (what we have on trunk now, although broken because we don't set the
> state of the peer), but there could be cases in which a peer believes to be
> leader, although it is not the leader any longer, and the joining peer would
> select this false leader to be its leader. Eventually, the false leader would
> timeout, and both processes would select the correct leader. This small fix
> gets rid of such problems, though.
This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.