Nevermind. I am on the wrong track. Flavio's earlier mail did clarify that the follower received the epoch before restart.
On Fri, Jun 18, 2010 at 6:20 PM, Vishal K <vishalm...@gmail.com> wrote: > I might be wrong here, but let me try to chip in my few cents. > > I think the problem is in LearnerHandler.java at the leader fo this > Follower. > > /* see what other packets from the proposal > * and tobeapplied queues need to be sent > * and then decide if we can just send a DIFF > * or we actually need to send the whole snapshot > */ > long leaderLastZxid = leader.startForwarding(this, updates); > ---> this leaderLastZxid returned is probably incorrect. > // a special case when both the ids are the same > if (peerLastZxid == leaderLastZxid) { > packetToSend = Leader.DIFF; > zxidToSend = leaderLastZxid; > } > > QuorumPacket newLeaderQP = new QuorumPacket(Leader.NEWLEADER, > leaderLastZxid, null, null); > oa.writeRecord(newLeaderQP, "packet"); > bufferedOutput.flush() > > > > On Fri, Jun 18, 2010 at 4:49 PM, Flavio Paiva Junqueira (JIRA) < > j...@apache.org> wrote: > >> >> [ >> https://issues.apache.org/jira/browse/ZOOKEEPER-335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880320#action_12880320] >> >> Flavio Paiva Junqueira commented on ZOOKEEPER-335: >> -------------------------------------------------- >> >> Guys, I don't see enough information in these logs to determine what's >> going on. Let me tell you what I'm seeing so that perhaps other folks can >> help me out here. >> >> One part of the log that is suspicious is this one: >> >> {noformat} >> =6693 [QuorumPeer:/0.0.0.0:2181] WARN >> org.apache.zookeeper.server.quorum.Learner - Got zxid 0x300000001 expected >> 0x1 >> =6693 [QuorumPeer:/0.0.0.0:2181] WARN >> org.apache.zookeeper.server.quorum.Learner - Got zxid 0x300000001 expected >> 0x1 >> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor30] >> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor27] >> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor22] >> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor23] >> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor18] >> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor20] >> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor19] >> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor31] >> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor21] >> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor26] >> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor25] >> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor33] >> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor29] >> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor28] >> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor24] >> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor32] >> >> ************* NODE RESTARTED HERE ********************** >> {noformat} >> >> Before being restarted, the bad node receives a proposal with zxid <3,1> >> and it expects <0,1>. Next in the logs after being restarted, I can see that >> it is complaining that it has epoch 4 and the leader 3. Something strange >> apparently happened during the restart. It also seems to be the case that >> the node was being able to talk to the others (first entries in the log >> before the excerpt above). >> >> Do you guys see anything I'm overlooking? >> >> > zookeeper servers should commit the new leader txn to their logs. >> > ----------------------------------------------------------------- >> > >> > Key: ZOOKEEPER-335 >> > URL: >> https://issues.apache.org/jira/browse/ZOOKEEPER-335 >> > Project: Zookeeper >> > Issue Type: Bug >> > Components: server >> > Affects Versions: 3.1.0 >> > Reporter: Mahadev konar >> > Assignee: Mahadev konar >> > Priority: Blocker >> > Fix For: 3.4.0 >> > >> > Attachments: zk.log.gz, zklogs.tar.gz >> > >> > >> > currently the zookeeper followers do not commit the new leader election. >> This will cause problems in a failure scenarios with a follower acking to >> the same leader txn id twice, which might be two different intermittent >> leaders and allowing them to propose two different txn's of the same zxid. >> >> -- >> This message is automatically generated by JIRA. >> - >> You can reply to this email to add a comment to the issue online. >> >> >