[jira] Updated: (ZOOKEEPER-788) Add server id to message logs

2010-06-18 Thread Ivan Kelly (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Kelly updated ZOOKEEPER-788:
-

Status: Open  (was: Patch Available)

forgot --no-prefix

 Add server id to message logs
 -

 Key: ZOOKEEPER-788
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-788
 Project: Zookeeper
  Issue Type: Improvement
  Components: contrib
Affects Versions: 3.3.1
Reporter: Ivan Kelly
Assignee: Ivan Kelly
Priority: Trivial
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-778.patch


 As discussed on IRC. The log visualisation needs some way of determining 
 which server made which log. If the log segment is taken for a time period 
 where no elections take place, there is no way to determine the id of the 
 server.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-788) Add server id to message logs

2010-06-18 Thread Ivan Kelly (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Kelly updated ZOOKEEPER-788:
-

Attachment: ZOOKEEPER-778.patch

 Add server id to message logs
 -

 Key: ZOOKEEPER-788
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-788
 Project: Zookeeper
  Issue Type: Improvement
  Components: contrib
Affects Versions: 3.3.1
Reporter: Ivan Kelly
Assignee: Ivan Kelly
Priority: Trivial
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-778.patch


 As discussed on IRC. The log visualisation needs some way of determining 
 which server made which log. If the log segment is taken for a time period 
 where no elections take place, there is no way to determine the id of the 
 server.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-788) Add server id to message logs

2010-06-18 Thread Ivan Kelly (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Kelly updated ZOOKEEPER-788:
-

Status: Patch Available  (was: Open)

second attempt

 Add server id to message logs
 -

 Key: ZOOKEEPER-788
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-788
 Project: Zookeeper
  Issue Type: Improvement
  Components: contrib
Affects Versions: 3.3.1
Reporter: Ivan Kelly
Assignee: Ivan Kelly
Priority: Trivial
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-778.patch


 As discussed on IRC. The log visualisation needs some way of determining 
 which server made which log. If the log segment is taken for a time period 
 where no elections take place, there is no way to determine the id of the 
 server.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-788) Add server id to message logs

2010-06-18 Thread Ivan Kelly (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Kelly updated ZOOKEEPER-788:
-

Attachment: (was: ZOOKEEPER-778.patch)

 Add server id to message logs
 -

 Key: ZOOKEEPER-788
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-788
 Project: Zookeeper
  Issue Type: Improvement
  Components: contrib
Affects Versions: 3.3.1
Reporter: Ivan Kelly
Assignee: Ivan Kelly
Priority: Trivial
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-778.patch


 As discussed on IRC. The log visualisation needs some way of determining 
 which server made which log. If the log segment is taken for a time period 
 where no elections take place, there is no way to determine the id of the 
 server.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-788) Add server id to message logs

2010-06-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880122#action_12880122
 ] 

Hadoop QA commented on ZOOKEEPER-788:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12447440/ZOOKEEPER-778.patch
  against trunk revision 953041.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/86/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/86/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/86/console

This message is automatically generated.

 Add server id to message logs
 -

 Key: ZOOKEEPER-788
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-788
 Project: Zookeeper
  Issue Type: Improvement
  Components: contrib
Affects Versions: 3.3.1
Reporter: Ivan Kelly
Assignee: Ivan Kelly
Priority: Trivial
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-778.patch


 As discussed on IRC. The log visualisation needs some way of determining 
 which server made which log. If the log segment is taken for a time period 
 where no elections take place, there is no way to determine the id of the 
 server.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-788) Add server id to message logs

2010-06-18 Thread Flavio Paiva Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880163#action_12880163
 ] 

Flavio Paiva Junqueira commented on ZOOKEEPER-788:
--

+1, looks good for me.

 Add server id to message logs
 -

 Key: ZOOKEEPER-788
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-788
 Project: Zookeeper
  Issue Type: Improvement
  Components: contrib
Affects Versions: 3.3.1
Reporter: Ivan Kelly
Assignee: Ivan Kelly
Priority: Trivial
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-778.patch


 As discussed on IRC. The log visualisation needs some way of determining 
 which server made which log. If the log segment is taken for a time period 
 where no elections take place, there is no way to determine the id of the 
 server.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-335) zookeeper servers should commit the new leader txn to their logs.

2010-06-18 Thread Flavio Paiva Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880202#action_12880202
 ] 

Flavio Paiva Junqueira commented on ZOOKEEPER-335:
--

Mike, There is one thing I don't understand. From the logs, it looks like 
servers 1 and 3 are proposing a zxid of 0 (second field of notification) during 
election, which makes me think that they had no state at all:

{noformat}
2010-06-17 14:35:40,714 - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@689] - Notification: 2, 
8589934884, 2, 2, LOOKING, LOOKING, 2
2010-06-17 14:35:40,714 - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@799] - Notification: 3, 0, 
1, 2, LOOKING, FOLLOWING, 1
2010-06-17 14:35:40,714 - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@799] - Notification: 3, 0, 
1, 2, LOOKING, LEADING, 3
{noformat}

Server 2 on the other hand had accepted updates based on the zxid it proposes. 
Were they supposed to have no state at all? Have you deleted your logs and 
snapshots before restarting the servers? 

 zookeeper servers should commit the new leader txn to their logs.
 -

 Key: ZOOKEEPER-335
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-335
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.1.0
Reporter: Mahadev konar
Assignee: Mahadev konar
Priority: Blocker
 Fix For: 3.4.0

 Attachments: zk.log.gz


 currently the zookeeper followers do not commit the new leader election. This 
 will cause problems in a failure scenarios with a follower acking to the same 
 leader txn id twice, which might be two different intermittent leaders and 
 allowing them to propose two different txn's of the same zxid.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-335) zookeeper servers should commit the new leader txn to their logs.

2010-06-18 Thread Vishal K (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vishal K updated ZOOKEEPER-335:
---

Attachment: zklogs.tar.gz

Hi,

I have attached the logs. The log entries are similar to those reported by 
others.

We are testing with 3 nodes. Each node is run in a VM running SLES 11. All 3 
VMs are run on the same host. VMs are sharing the same disk. cpuinfo and 
meminfo for VM is in the attached file.
I have also tried to collect more info with -verbose:gc -Xloggc:/../ -Xprof 
options to java.

gc.lg contains the gc output
rest of the info should be in msgs.log
Default java heap size was used. java version  1.6.0_18 was used.

One point to note - In my case, on the misbehaving node could not joing the 
cluster. Rest of the cluster was stable (except for the flood of log messages 
on the leader because the misbehaving follower kept terminating session).

Hope this helps.
Thanks.

 zookeeper servers should commit the new leader txn to their logs.
 -

 Key: ZOOKEEPER-335
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-335
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.1.0
Reporter: Mahadev konar
Assignee: Mahadev konar
Priority: Blocker
 Fix For: 3.4.0

 Attachments: zk.log.gz, zklogs.tar.gz


 currently the zookeeper followers do not commit the new leader election. This 
 will cause problems in a failure scenarios with a follower acking to the same 
 leader txn id twice, which might be two different intermittent leaders and 
 allowing them to propose two different txn's of the same zxid.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-335) zookeeper servers should commit the new leader txn to their logs.

2010-06-18 Thread Flavio Paiva Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880320#action_12880320
 ] 

Flavio Paiva Junqueira commented on ZOOKEEPER-335:
--

Guys, I don't see enough information in these logs to determine what's going 
on. Let me tell you what I'm seeing so that perhaps other folks can help me out 
here. 

One part of the log that is suspicious is this one:

{noformat}
=6693 [QuorumPeer:/0.0.0.0:2181] WARN  
org.apache.zookeeper.server.quorum.Learner  - Got zxid 0x30001 expected 0x1
=6693 [QuorumPeer:/0.0.0.0:2181] WARN  
org.apache.zookeeper.server.quorum.Learner  - Got zxid 0x30001 expected 0x1
[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor30]
[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor27]
[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor22]
[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor23]
[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor18]
[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor20]
[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor19]
[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor31]
[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor21]
[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor26]
[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor25]
[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor33]
[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor29]
[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor28]
[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor24]
[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor32]

* NODE RESTARTED HERE **
{noformat}

Before being restarted, the bad node receives a proposal with zxid 3,1 and it 
expects 0,1. Next in the logs after being restarted, I can see that it is 
complaining that it has epoch 4 and the leader 3. Something strange apparently 
happened during the restart. It also seems to be the case that the node was 
being able to talk to the others (first entries in the log before the excerpt 
above).

Do you guys see anything I'm overlooking?

 zookeeper servers should commit the new leader txn to their logs.
 -

 Key: ZOOKEEPER-335
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-335
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.1.0
Reporter: Mahadev konar
Assignee: Mahadev konar
Priority: Blocker
 Fix For: 3.4.0

 Attachments: zk.log.gz, zklogs.tar.gz


 currently the zookeeper followers do not commit the new leader election. This 
 will cause problems in a failure scenarios with a follower acking to the same 
 leader txn id twice, which might be two different intermittent leaders and 
 allowing them to propose two different txn's of the same zxid.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (ZOOKEEPER-789) Improve FLE log messages

2010-06-18 Thread Flavio Paiva Junqueira (JIRA)
Improve FLE log messages


 Key: ZOOKEEPER-789
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-789
 Project: Zookeeper
  Issue Type: Improvement
Reporter: Flavio Paiva Junqueira


Notification messages are quite important to determine what is going with 
leader election. The main idea of this improvement is name the fields we output 
in notification log messages.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [jira] Commented: (ZOOKEEPER-335) zookeeper servers should commit the new leader txn to their logs.

2010-06-18 Thread Vishal K
I might be wrong here, but let me try to chip in my few cents.

I think the problem is in LearnerHandler.java at the leader fo this
Follower.

/* see what other packets from the proposal
 * and tobeapplied queues need to be sent
 * and then decide if we can just send a DIFF
 * or we actually need to send the whole snapshot
 */
long leaderLastZxid = leader.startForwarding(this, updates);
--- this leaderLastZxid returned is probably incorrect.
// a special case when both the ids are the same
if (peerLastZxid == leaderLastZxid) {
packetToSend = Leader.DIFF;
zxidToSend = leaderLastZxid;
}

QuorumPacket newLeaderQP = new QuorumPacket(Leader.NEWLEADER,
leaderLastZxid, null, null);
oa.writeRecord(newLeaderQP, packet);
bufferedOutput.flush()


On Fri, Jun 18, 2010 at 4:49 PM, Flavio Paiva Junqueira (JIRA) 
j...@apache.org wrote:


[
 https://issues.apache.org/jira/browse/ZOOKEEPER-335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880320#action_12880320]

 Flavio Paiva Junqueira commented on ZOOKEEPER-335:
 --

 Guys, I don't see enough information in these logs to determine what's
 going on. Let me tell you what I'm seeing so that perhaps other folks can
 help me out here.

 One part of the log that is suspicious is this one:

 {noformat}
 =6693 [QuorumPeer:/0.0.0.0:2181] WARN
  org.apache.zookeeper.server.quorum.Learner  - Got zxid 0x30001 expected
 0x1
 =6693 [QuorumPeer:/0.0.0.0:2181] WARN
  org.apache.zookeeper.server.quorum.Learner  - Got zxid 0x30001 expected
 0x1
 [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor30]
 [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor27]
 [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor22]
 [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor23]
 [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor18]
 [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor20]
 [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor19]
 [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor31]
 [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor21]
 [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor26]
 [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor25]
 [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor33]
 [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor29]
 [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor28]
 [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor24]
 [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor32]

 * NODE RESTARTED HERE **
 {noformat}

 Before being restarted, the bad node receives a proposal with zxid 3,1
 and it expects 0,1. Next in the logs after being restarted, I can see that
 it is complaining that it has epoch 4 and the leader 3. Something strange
 apparently happened during the restart. It also seems to be the case that
 the node was being able to talk to the others (first entries in the log
 before the excerpt above).

 Do you guys see anything I'm overlooking?

  zookeeper servers should commit the new leader txn to their logs.
  -
 
  Key: ZOOKEEPER-335
  URL: https://issues.apache.org/jira/browse/ZOOKEEPER-335
  Project: Zookeeper
   Issue Type: Bug
   Components: server
 Affects Versions: 3.1.0
 Reporter: Mahadev konar
 Assignee: Mahadev konar
 Priority: Blocker
  Fix For: 3.4.0
 
  Attachments: zk.log.gz, zklogs.tar.gz
 
 
  currently the zookeeper followers do not commit the new leader election.
 This will cause problems in a failure scenarios with a follower acking to
 the same leader txn id twice, which might be two different intermittent
 leaders and allowing them to propose two different txn's of the same zxid.

 --
 This message is automatically generated by JIRA.
 -
 You can reply to this email to add a comment to the issue online.




Re: [jira] Commented: (ZOOKEEPER-335) zookeeper servers should commit the new leader txn to their logs.

2010-06-18 Thread Vishal K
Nevermind. I am on the wrong track. Flavio's earlier mail did clarify that
the follower received the epoch before restart.

On Fri, Jun 18, 2010 at 6:20 PM, Vishal K vishalm...@gmail.com wrote:

 I might be wrong here, but let me try to chip in my few cents.

 I think the problem is in LearnerHandler.java at the leader fo this
 Follower.

 /* see what other packets from the proposal
  * and tobeapplied queues need to be sent
  * and then decide if we can just send a DIFF
  * or we actually need to send the whole snapshot
  */
 long leaderLastZxid = leader.startForwarding(this, updates);
 --- this leaderLastZxid returned is probably incorrect.
 // a special case when both the ids are the same
 if (peerLastZxid == leaderLastZxid) {
 packetToSend = Leader.DIFF;
 zxidToSend = leaderLastZxid;
 }

 QuorumPacket newLeaderQP = new QuorumPacket(Leader.NEWLEADER,
 leaderLastZxid, null, null);
 oa.writeRecord(newLeaderQP, packet);
 bufferedOutput.flush()



 On Fri, Jun 18, 2010 at 4:49 PM, Flavio Paiva Junqueira (JIRA) 
 j...@apache.org wrote:


[
 https://issues.apache.org/jira/browse/ZOOKEEPER-335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880320#action_12880320]

 Flavio Paiva Junqueira commented on ZOOKEEPER-335:
 --

 Guys, I don't see enough information in these logs to determine what's
 going on. Let me tell you what I'm seeing so that perhaps other folks can
 help me out here.

 One part of the log that is suspicious is this one:

 {noformat}
 =6693 [QuorumPeer:/0.0.0.0:2181] WARN
  org.apache.zookeeper.server.quorum.Learner  - Got zxid 0x30001 expected
 0x1
 =6693 [QuorumPeer:/0.0.0.0:2181] WARN
  org.apache.zookeeper.server.quorum.Learner  - Got zxid 0x30001 expected
 0x1
 [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor30]
 [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor27]
 [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor22]
 [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor23]
 [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor18]
 [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor20]
 [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor19]
 [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor31]
 [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor21]
 [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor26]
 [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor25]
 [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor33]
 [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor29]
 [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor28]
 [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor24]
 [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor32]

 * NODE RESTARTED HERE **
 {noformat}

 Before being restarted, the bad node receives a proposal with zxid 3,1
 and it expects 0,1. Next in the logs after being restarted, I can see that
 it is complaining that it has epoch 4 and the leader 3. Something strange
 apparently happened during the restart. It also seems to be the case that
 the node was being able to talk to the others (first entries in the log
 before the excerpt above).

 Do you guys see anything I'm overlooking?

  zookeeper servers should commit the new leader txn to their logs.
  -
 
  Key: ZOOKEEPER-335
  URL:
 https://issues.apache.org/jira/browse/ZOOKEEPER-335
  Project: Zookeeper
   Issue Type: Bug
   Components: server
 Affects Versions: 3.1.0
 Reporter: Mahadev konar
 Assignee: Mahadev konar
 Priority: Blocker
  Fix For: 3.4.0
 
  Attachments: zk.log.gz, zklogs.tar.gz
 
 
  currently the zookeeper followers do not commit the new leader election.
 This will cause problems in a failure scenarios with a follower acking to
 the same leader txn id twice, which might be two different intermittent
 leaders and allowing them to propose two different txn's of the same zxid.

 --
 This message is automatically generated by JIRA.
 -
 You can reply to this email to add a comment to the issue online.





Re: [jira] Commented: (ZOOKEEPER-335) zookeeper servers should commit the new leader txn to their logs.

2010-06-18 Thread Vishal K
Hi Flavio,

I have 3 set of logs and they all seem to indicate two problems on the
misbehaving follower:

Problem 1: Expected zxid is incorrect
=0[QuorumPeer:/0.0.0.0:2181] WARN
org.apache.zookeeper.server.quorum.Learner  - Got zxid 0x30002 expected
0x1
=0[QuorumPeer:/0.0.0.0:2181] WARN
org.apache.zookeeper.server.quorum.Learner  - Got zxid 0x30002 expected
0x1
=2495 [QuorumPeer:/0.0.0.0:2181] WARN
org.apache.zookeeper.server.quorum.Learner  - Got zxid 0x40001 expected
0x1
=2495 [QuorumPeer:/0.0.0.0:2181] WARN
org.apache.zookeeper.server.quorum.Learner  - Got zxid 0x40001 expected
0x1
=191617 [QuorumPeer:/0.0.0.0:2181] WARN
org.apache.zookeeper.server.quorum.Learner  - Got zxid 0x50001 expected
0x1
=191617 [QuorumPeer:/0.0.0.0:2181] WARN
org.apache.zookeeper.server.quorum.Learner  - Got zxid 0x50001 expected
0x1
=0[QuorumPeer:/0.0.0.0:2181] WARN
org.apache.zookeeper.server.quorum.Learner  - Got zxid 0x60001 expected
0x1
=0[QuorumPeer:/0.0.0.0:2181] WARN
org.apache.zookeeper.server.quorum.Learner  - Got zxid 0x60001 expected
0x1
=245016 [QuorumPeer:/0.0.0.0:2181] WARN
org.apache.zookeeper.server.quorum.Learner  - Got zxid 0x70001 expected
0x1
=245016 [QuorumPeer:/0.0.0.0:2181] WARN
org.apache.zookeeper.server.quorum.Learner  - Got zxid 0x70001 expected
0x1

Note expected zxid is always 0x1 (lastQueued is always 0?)

Problem 2: While joining the cluster expected epoch is 1 higher than seen
earlier
=14991 [QuorumPeer:/0.0.0.0:2181] FATAL
org.apache.zookeeper.server.quorum.Learner  - Leader epoch 7 is less than
our epoch 8

-Vishal

On Fri, Jun 18, 2010 at 6:33 PM, Vishal K vishalm...@gmail.com wrote:


 Nevermind. I am on the wrong track. Flavio's earlier mail did clarify that
 the follower received the epoch before restart.


 On Fri, Jun 18, 2010 at 6:20 PM, Vishal K vishalm...@gmail.com wrote:

 I might be wrong here, but let me try to chip in my few cents.

 I think the problem is in LearnerHandler.java at the leader fo this
 Follower.

 /* see what other packets from the proposal
  * and tobeapplied queues need to be sent
  * and then decide if we can just send a DIFF
  * or we actually need to send the whole snapshot
  */
 long leaderLastZxid = leader.startForwarding(this, updates);
 --- this leaderLastZxid returned is probably incorrect.
 // a special case when both the ids are the same
 if (peerLastZxid == leaderLastZxid) {
 packetToSend = Leader.DIFF;
 zxidToSend = leaderLastZxid;
 }

 QuorumPacket newLeaderQP = new QuorumPacket(Leader.NEWLEADER,
 leaderLastZxid, null, null);
 oa.writeRecord(newLeaderQP, packet);
 bufferedOutput.flush()



 On Fri, Jun 18, 2010 at 4:49 PM, Flavio Paiva Junqueira (JIRA) 
 j...@apache.org wrote:


[
 https://issues.apache.org/jira/browse/ZOOKEEPER-335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880320#action_12880320]

 Flavio Paiva Junqueira commented on ZOOKEEPER-335:
 --

 Guys, I don't see enough information in these logs to determine what's
 going on. Let me tell you what I'm seeing so that perhaps other folks can
 help me out here.

 One part of the log that is suspicious is this one:

 {noformat}
 =6693 [QuorumPeer:/0.0.0.0:2181] WARN
  org.apache.zookeeper.server.quorum.Learner  - Got zxid 0x30001 expected
 0x1
 =6693 [QuorumPeer:/0.0.0.0:2181] WARN
  org.apache.zookeeper.server.quorum.Learner  - Got zxid 0x30001 expected
 0x1
 [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor30]
 [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor27]
 [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor22]
 [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor23]
 [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor18]
 [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor20]
 [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor19]
 [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor31]
 [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor21]
 [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor26]
 [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor25]
 [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor33]
 [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor29]
 [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor28]
 [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor24]
 [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor32]

 * NODE RESTARTED HERE **
 {noformat}

 Before being restarted, the bad node