[jira] Updated: (ZOOKEEPER-788) Add server id to message logs
[ https://issues.apache.org/jira/browse/ZOOKEEPER-788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Kelly updated ZOOKEEPER-788: - Status: Open (was: Patch Available) forgot --no-prefix Add server id to message logs - Key: ZOOKEEPER-788 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-788 Project: Zookeeper Issue Type: Improvement Components: contrib Affects Versions: 3.3.1 Reporter: Ivan Kelly Assignee: Ivan Kelly Priority: Trivial Fix For: 3.4.0 Attachments: ZOOKEEPER-778.patch As discussed on IRC. The log visualisation needs some way of determining which server made which log. If the log segment is taken for a time period where no elections take place, there is no way to determine the id of the server. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-788) Add server id to message logs
[ https://issues.apache.org/jira/browse/ZOOKEEPER-788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Kelly updated ZOOKEEPER-788: - Attachment: ZOOKEEPER-778.patch Add server id to message logs - Key: ZOOKEEPER-788 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-788 Project: Zookeeper Issue Type: Improvement Components: contrib Affects Versions: 3.3.1 Reporter: Ivan Kelly Assignee: Ivan Kelly Priority: Trivial Fix For: 3.4.0 Attachments: ZOOKEEPER-778.patch As discussed on IRC. The log visualisation needs some way of determining which server made which log. If the log segment is taken for a time period where no elections take place, there is no way to determine the id of the server. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-788) Add server id to message logs
[ https://issues.apache.org/jira/browse/ZOOKEEPER-788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Kelly updated ZOOKEEPER-788: - Status: Patch Available (was: Open) second attempt Add server id to message logs - Key: ZOOKEEPER-788 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-788 Project: Zookeeper Issue Type: Improvement Components: contrib Affects Versions: 3.3.1 Reporter: Ivan Kelly Assignee: Ivan Kelly Priority: Trivial Fix For: 3.4.0 Attachments: ZOOKEEPER-778.patch As discussed on IRC. The log visualisation needs some way of determining which server made which log. If the log segment is taken for a time period where no elections take place, there is no way to determine the id of the server. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-788) Add server id to message logs
[ https://issues.apache.org/jira/browse/ZOOKEEPER-788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Kelly updated ZOOKEEPER-788: - Attachment: (was: ZOOKEEPER-778.patch) Add server id to message logs - Key: ZOOKEEPER-788 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-788 Project: Zookeeper Issue Type: Improvement Components: contrib Affects Versions: 3.3.1 Reporter: Ivan Kelly Assignee: Ivan Kelly Priority: Trivial Fix For: 3.4.0 Attachments: ZOOKEEPER-778.patch As discussed on IRC. The log visualisation needs some way of determining which server made which log. If the log segment is taken for a time period where no elections take place, there is no way to determine the id of the server. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-788) Add server id to message logs
[ https://issues.apache.org/jira/browse/ZOOKEEPER-788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880122#action_12880122 ] Hadoop QA commented on ZOOKEEPER-788: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12447440/ZOOKEEPER-778.patch against trunk revision 953041. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/86/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/86/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/86/console This message is automatically generated. Add server id to message logs - Key: ZOOKEEPER-788 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-788 Project: Zookeeper Issue Type: Improvement Components: contrib Affects Versions: 3.3.1 Reporter: Ivan Kelly Assignee: Ivan Kelly Priority: Trivial Fix For: 3.4.0 Attachments: ZOOKEEPER-778.patch As discussed on IRC. The log visualisation needs some way of determining which server made which log. If the log segment is taken for a time period where no elections take place, there is no way to determine the id of the server. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-788) Add server id to message logs
[ https://issues.apache.org/jira/browse/ZOOKEEPER-788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880163#action_12880163 ] Flavio Paiva Junqueira commented on ZOOKEEPER-788: -- +1, looks good for me. Add server id to message logs - Key: ZOOKEEPER-788 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-788 Project: Zookeeper Issue Type: Improvement Components: contrib Affects Versions: 3.3.1 Reporter: Ivan Kelly Assignee: Ivan Kelly Priority: Trivial Fix For: 3.4.0 Attachments: ZOOKEEPER-778.patch As discussed on IRC. The log visualisation needs some way of determining which server made which log. If the log segment is taken for a time period where no elections take place, there is no way to determine the id of the server. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-335) zookeeper servers should commit the new leader txn to their logs.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880202#action_12880202 ] Flavio Paiva Junqueira commented on ZOOKEEPER-335: -- Mike, There is one thing I don't understand. From the logs, it looks like servers 1 and 3 are proposing a zxid of 0 (second field of notification) during election, which makes me think that they had no state at all: {noformat} 2010-06-17 14:35:40,714 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@689] - Notification: 2, 8589934884, 2, 2, LOOKING, LOOKING, 2 2010-06-17 14:35:40,714 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@799] - Notification: 3, 0, 1, 2, LOOKING, FOLLOWING, 1 2010-06-17 14:35:40,714 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@799] - Notification: 3, 0, 1, 2, LOOKING, LEADING, 3 {noformat} Server 2 on the other hand had accepted updates based on the zxid it proposes. Were they supposed to have no state at all? Have you deleted your logs and snapshots before restarting the servers? zookeeper servers should commit the new leader txn to their logs. - Key: ZOOKEEPER-335 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-335 Project: Zookeeper Issue Type: Bug Components: server Affects Versions: 3.1.0 Reporter: Mahadev konar Assignee: Mahadev konar Priority: Blocker Fix For: 3.4.0 Attachments: zk.log.gz currently the zookeeper followers do not commit the new leader election. This will cause problems in a failure scenarios with a follower acking to the same leader txn id twice, which might be two different intermittent leaders and allowing them to propose two different txn's of the same zxid. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-335) zookeeper servers should commit the new leader txn to their logs.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vishal K updated ZOOKEEPER-335: --- Attachment: zklogs.tar.gz Hi, I have attached the logs. The log entries are similar to those reported by others. We are testing with 3 nodes. Each node is run in a VM running SLES 11. All 3 VMs are run on the same host. VMs are sharing the same disk. cpuinfo and meminfo for VM is in the attached file. I have also tried to collect more info with -verbose:gc -Xloggc:/../ -Xprof options to java. gc.lg contains the gc output rest of the info should be in msgs.log Default java heap size was used. java version 1.6.0_18 was used. One point to note - In my case, on the misbehaving node could not joing the cluster. Rest of the cluster was stable (except for the flood of log messages on the leader because the misbehaving follower kept terminating session). Hope this helps. Thanks. zookeeper servers should commit the new leader txn to their logs. - Key: ZOOKEEPER-335 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-335 Project: Zookeeper Issue Type: Bug Components: server Affects Versions: 3.1.0 Reporter: Mahadev konar Assignee: Mahadev konar Priority: Blocker Fix For: 3.4.0 Attachments: zk.log.gz, zklogs.tar.gz currently the zookeeper followers do not commit the new leader election. This will cause problems in a failure scenarios with a follower acking to the same leader txn id twice, which might be two different intermittent leaders and allowing them to propose two different txn's of the same zxid. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-335) zookeeper servers should commit the new leader txn to their logs.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880320#action_12880320 ] Flavio Paiva Junqueira commented on ZOOKEEPER-335: -- Guys, I don't see enough information in these logs to determine what's going on. Let me tell you what I'm seeing so that perhaps other folks can help me out here. One part of the log that is suspicious is this one: {noformat} =6693 [QuorumPeer:/0.0.0.0:2181] WARN org.apache.zookeeper.server.quorum.Learner - Got zxid 0x30001 expected 0x1 =6693 [QuorumPeer:/0.0.0.0:2181] WARN org.apache.zookeeper.server.quorum.Learner - Got zxid 0x30001 expected 0x1 [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor30] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor27] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor22] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor23] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor18] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor20] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor19] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor31] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor21] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor26] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor25] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor33] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor29] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor28] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor24] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor32] * NODE RESTARTED HERE ** {noformat} Before being restarted, the bad node receives a proposal with zxid 3,1 and it expects 0,1. Next in the logs after being restarted, I can see that it is complaining that it has epoch 4 and the leader 3. Something strange apparently happened during the restart. It also seems to be the case that the node was being able to talk to the others (first entries in the log before the excerpt above). Do you guys see anything I'm overlooking? zookeeper servers should commit the new leader txn to their logs. - Key: ZOOKEEPER-335 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-335 Project: Zookeeper Issue Type: Bug Components: server Affects Versions: 3.1.0 Reporter: Mahadev konar Assignee: Mahadev konar Priority: Blocker Fix For: 3.4.0 Attachments: zk.log.gz, zklogs.tar.gz currently the zookeeper followers do not commit the new leader election. This will cause problems in a failure scenarios with a follower acking to the same leader txn id twice, which might be two different intermittent leaders and allowing them to propose two different txn's of the same zxid. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (ZOOKEEPER-789) Improve FLE log messages
Improve FLE log messages Key: ZOOKEEPER-789 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-789 Project: Zookeeper Issue Type: Improvement Reporter: Flavio Paiva Junqueira Notification messages are quite important to determine what is going with leader election. The main idea of this improvement is name the fields we output in notification log messages. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [jira] Commented: (ZOOKEEPER-335) zookeeper servers should commit the new leader txn to their logs.
I might be wrong here, but let me try to chip in my few cents. I think the problem is in LearnerHandler.java at the leader fo this Follower. /* see what other packets from the proposal * and tobeapplied queues need to be sent * and then decide if we can just send a DIFF * or we actually need to send the whole snapshot */ long leaderLastZxid = leader.startForwarding(this, updates); --- this leaderLastZxid returned is probably incorrect. // a special case when both the ids are the same if (peerLastZxid == leaderLastZxid) { packetToSend = Leader.DIFF; zxidToSend = leaderLastZxid; } QuorumPacket newLeaderQP = new QuorumPacket(Leader.NEWLEADER, leaderLastZxid, null, null); oa.writeRecord(newLeaderQP, packet); bufferedOutput.flush() On Fri, Jun 18, 2010 at 4:49 PM, Flavio Paiva Junqueira (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/ZOOKEEPER-335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880320#action_12880320] Flavio Paiva Junqueira commented on ZOOKEEPER-335: -- Guys, I don't see enough information in these logs to determine what's going on. Let me tell you what I'm seeing so that perhaps other folks can help me out here. One part of the log that is suspicious is this one: {noformat} =6693 [QuorumPeer:/0.0.0.0:2181] WARN org.apache.zookeeper.server.quorum.Learner - Got zxid 0x30001 expected 0x1 =6693 [QuorumPeer:/0.0.0.0:2181] WARN org.apache.zookeeper.server.quorum.Learner - Got zxid 0x30001 expected 0x1 [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor30] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor27] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor22] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor23] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor18] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor20] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor19] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor31] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor21] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor26] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor25] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor33] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor29] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor28] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor24] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor32] * NODE RESTARTED HERE ** {noformat} Before being restarted, the bad node receives a proposal with zxid 3,1 and it expects 0,1. Next in the logs after being restarted, I can see that it is complaining that it has epoch 4 and the leader 3. Something strange apparently happened during the restart. It also seems to be the case that the node was being able to talk to the others (first entries in the log before the excerpt above). Do you guys see anything I'm overlooking? zookeeper servers should commit the new leader txn to their logs. - Key: ZOOKEEPER-335 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-335 Project: Zookeeper Issue Type: Bug Components: server Affects Versions: 3.1.0 Reporter: Mahadev konar Assignee: Mahadev konar Priority: Blocker Fix For: 3.4.0 Attachments: zk.log.gz, zklogs.tar.gz currently the zookeeper followers do not commit the new leader election. This will cause problems in a failure scenarios with a follower acking to the same leader txn id twice, which might be two different intermittent leaders and allowing them to propose two different txn's of the same zxid. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [jira] Commented: (ZOOKEEPER-335) zookeeper servers should commit the new leader txn to their logs.
Nevermind. I am on the wrong track. Flavio's earlier mail did clarify that the follower received the epoch before restart. On Fri, Jun 18, 2010 at 6:20 PM, Vishal K vishalm...@gmail.com wrote: I might be wrong here, but let me try to chip in my few cents. I think the problem is in LearnerHandler.java at the leader fo this Follower. /* see what other packets from the proposal * and tobeapplied queues need to be sent * and then decide if we can just send a DIFF * or we actually need to send the whole snapshot */ long leaderLastZxid = leader.startForwarding(this, updates); --- this leaderLastZxid returned is probably incorrect. // a special case when both the ids are the same if (peerLastZxid == leaderLastZxid) { packetToSend = Leader.DIFF; zxidToSend = leaderLastZxid; } QuorumPacket newLeaderQP = new QuorumPacket(Leader.NEWLEADER, leaderLastZxid, null, null); oa.writeRecord(newLeaderQP, packet); bufferedOutput.flush() On Fri, Jun 18, 2010 at 4:49 PM, Flavio Paiva Junqueira (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/ZOOKEEPER-335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880320#action_12880320] Flavio Paiva Junqueira commented on ZOOKEEPER-335: -- Guys, I don't see enough information in these logs to determine what's going on. Let me tell you what I'm seeing so that perhaps other folks can help me out here. One part of the log that is suspicious is this one: {noformat} =6693 [QuorumPeer:/0.0.0.0:2181] WARN org.apache.zookeeper.server.quorum.Learner - Got zxid 0x30001 expected 0x1 =6693 [QuorumPeer:/0.0.0.0:2181] WARN org.apache.zookeeper.server.quorum.Learner - Got zxid 0x30001 expected 0x1 [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor30] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor27] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor22] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor23] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor18] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor20] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor19] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor31] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor21] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor26] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor25] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor33] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor29] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor28] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor24] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor32] * NODE RESTARTED HERE ** {noformat} Before being restarted, the bad node receives a proposal with zxid 3,1 and it expects 0,1. Next in the logs after being restarted, I can see that it is complaining that it has epoch 4 and the leader 3. Something strange apparently happened during the restart. It also seems to be the case that the node was being able to talk to the others (first entries in the log before the excerpt above). Do you guys see anything I'm overlooking? zookeeper servers should commit the new leader txn to their logs. - Key: ZOOKEEPER-335 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-335 Project: Zookeeper Issue Type: Bug Components: server Affects Versions: 3.1.0 Reporter: Mahadev konar Assignee: Mahadev konar Priority: Blocker Fix For: 3.4.0 Attachments: zk.log.gz, zklogs.tar.gz currently the zookeeper followers do not commit the new leader election. This will cause problems in a failure scenarios with a follower acking to the same leader txn id twice, which might be two different intermittent leaders and allowing them to propose two different txn's of the same zxid. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [jira] Commented: (ZOOKEEPER-335) zookeeper servers should commit the new leader txn to their logs.
Hi Flavio, I have 3 set of logs and they all seem to indicate two problems on the misbehaving follower: Problem 1: Expected zxid is incorrect =0[QuorumPeer:/0.0.0.0:2181] WARN org.apache.zookeeper.server.quorum.Learner - Got zxid 0x30002 expected 0x1 =0[QuorumPeer:/0.0.0.0:2181] WARN org.apache.zookeeper.server.quorum.Learner - Got zxid 0x30002 expected 0x1 =2495 [QuorumPeer:/0.0.0.0:2181] WARN org.apache.zookeeper.server.quorum.Learner - Got zxid 0x40001 expected 0x1 =2495 [QuorumPeer:/0.0.0.0:2181] WARN org.apache.zookeeper.server.quorum.Learner - Got zxid 0x40001 expected 0x1 =191617 [QuorumPeer:/0.0.0.0:2181] WARN org.apache.zookeeper.server.quorum.Learner - Got zxid 0x50001 expected 0x1 =191617 [QuorumPeer:/0.0.0.0:2181] WARN org.apache.zookeeper.server.quorum.Learner - Got zxid 0x50001 expected 0x1 =0[QuorumPeer:/0.0.0.0:2181] WARN org.apache.zookeeper.server.quorum.Learner - Got zxid 0x60001 expected 0x1 =0[QuorumPeer:/0.0.0.0:2181] WARN org.apache.zookeeper.server.quorum.Learner - Got zxid 0x60001 expected 0x1 =245016 [QuorumPeer:/0.0.0.0:2181] WARN org.apache.zookeeper.server.quorum.Learner - Got zxid 0x70001 expected 0x1 =245016 [QuorumPeer:/0.0.0.0:2181] WARN org.apache.zookeeper.server.quorum.Learner - Got zxid 0x70001 expected 0x1 Note expected zxid is always 0x1 (lastQueued is always 0?) Problem 2: While joining the cluster expected epoch is 1 higher than seen earlier =14991 [QuorumPeer:/0.0.0.0:2181] FATAL org.apache.zookeeper.server.quorum.Learner - Leader epoch 7 is less than our epoch 8 -Vishal On Fri, Jun 18, 2010 at 6:33 PM, Vishal K vishalm...@gmail.com wrote: Nevermind. I am on the wrong track. Flavio's earlier mail did clarify that the follower received the epoch before restart. On Fri, Jun 18, 2010 at 6:20 PM, Vishal K vishalm...@gmail.com wrote: I might be wrong here, but let me try to chip in my few cents. I think the problem is in LearnerHandler.java at the leader fo this Follower. /* see what other packets from the proposal * and tobeapplied queues need to be sent * and then decide if we can just send a DIFF * or we actually need to send the whole snapshot */ long leaderLastZxid = leader.startForwarding(this, updates); --- this leaderLastZxid returned is probably incorrect. // a special case when both the ids are the same if (peerLastZxid == leaderLastZxid) { packetToSend = Leader.DIFF; zxidToSend = leaderLastZxid; } QuorumPacket newLeaderQP = new QuorumPacket(Leader.NEWLEADER, leaderLastZxid, null, null); oa.writeRecord(newLeaderQP, packet); bufferedOutput.flush() On Fri, Jun 18, 2010 at 4:49 PM, Flavio Paiva Junqueira (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/ZOOKEEPER-335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880320#action_12880320] Flavio Paiva Junqueira commented on ZOOKEEPER-335: -- Guys, I don't see enough information in these logs to determine what's going on. Let me tell you what I'm seeing so that perhaps other folks can help me out here. One part of the log that is suspicious is this one: {noformat} =6693 [QuorumPeer:/0.0.0.0:2181] WARN org.apache.zookeeper.server.quorum.Learner - Got zxid 0x30001 expected 0x1 =6693 [QuorumPeer:/0.0.0.0:2181] WARN org.apache.zookeeper.server.quorum.Learner - Got zxid 0x30001 expected 0x1 [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor30] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor27] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor22] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor23] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor18] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor20] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor19] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor31] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor21] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor26] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor25] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor33] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor29] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor28] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor24] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor32] * NODE RESTARTED HERE ** {noformat} Before being restarted, the bad node