[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete

2010-07-20 Thread Flavio Paiva Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12890199#action_12890199
 ] 

Flavio Paiva Junqueira commented on ZOOKEEPER-822:
--

Hi Vishal, Do you think you can uploaded all three log files for a problematic 
run? We would like to put it on loggraph to visualize what's going on there. It 
sounds like it is somehow related to the VM reboots, I don't know why yet.

 Leader election taking a long time  to complete
 ---

 Key: ZOOKEEPER-822
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum
Affects Versions: 3.3.0
Reporter: Vishal K
Priority: Blocker
 Attachments: test_zookeeper_1.log, test_zookeeper_2.log


 Created a 3 node cluster.
 1 Fail the ZK leader
 2. Let leader election finish. Restart the leader and let it join the 
 3. Repeat 
 After a few rounds leader election takes anywhere 25- 60 seconds to finish. 
 Note- we didn't have any ZK clients and no new znodes were created.
 zoo.cfg is shown below:
 #Mon Jul 19 12:15:10 UTC 2010
 server.1=192.168.4.12\:2888\:3888
 server.0=192.168.4.11\:2888\:3888
 clientPort=2181
 dataDir=/var/zookeeper
 syncLimit=2
 server.2=192.168.4.13\:2888\:3888
 initLimit=5
 tickTime=2000
 I have attached logs from two nodes that took a long time to form the cluster 
 after failing the leader. The leader was down anyways so logs from that node 
 shouldn't matter.
 Look for START HERE. Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-790) Last processed zxid set prematurely while establishing leadership

2010-07-20 Thread Flavio Paiva Junqueira (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Paiva Junqueira updated ZOOKEEPER-790:
-

Attachment: ZOOKEEPER-790.patch

This patch is very simple: it moves two function calls such that a leader only 
starts up and sets the the last processed zxid after it has a quorum of 
supporters. It also includes a unit test. I had to make a few modifications 
here and there to be able to write this test. I tried to minimize changes as 
much as possible.

 Last processed zxid set prematurely while establishing leadership
 -

 Key: ZOOKEEPER-790
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-790
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum
Affects Versions: 3.3.1
Reporter: Flavio Paiva Junqueira
Assignee: Flavio Paiva Junqueira
Priority: Blocker
 Fix For: 3.3.2, 3.4.0

 Attachments: ZOOKEEPER-790.patch, ZOOKEEPER-790.patch, 
 ZOOKEEPER-790.patch, ZOOKEEPER-790.travis.log.bz2


 The leader code is setting the last processed zxid to the first of the new 
 epoch even before connecting to a quorum of followers. Because the leader 
 code sets this value before connecting to a quorum of followers 
 (Leader.java:281) and the follower code throws an IOException 
 (Follower.java:73) if the leader epoch is smaller, we have that when the 
 false leader drops leadership and becomes a follower, it finds a smaller 
 epoch and kills itself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-790) Last processed zxid set prematurely while establishing leadership

2010-07-20 Thread Flavio Paiva Junqueira (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Paiva Junqueira updated ZOOKEEPER-790:
-

Status: Patch Available  (was: Open)

 Last processed zxid set prematurely while establishing leadership
 -

 Key: ZOOKEEPER-790
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-790
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum
Affects Versions: 3.3.1
Reporter: Flavio Paiva Junqueira
Assignee: Flavio Paiva Junqueira
Priority: Blocker
 Fix For: 3.3.2, 3.4.0

 Attachments: ZOOKEEPER-790.patch, ZOOKEEPER-790.patch, 
 ZOOKEEPER-790.patch, ZOOKEEPER-790.travis.log.bz2


 The leader code is setting the last processed zxid to the first of the new 
 epoch even before connecting to a quorum of followers. Because the leader 
 code sets this value before connecting to a quorum of followers 
 (Leader.java:281) and the follower code throws an IOException 
 (Follower.java:73) if the leader epoch is smaller, we have that when the 
 false leader drops leadership and becomes a follower, it finds a smaller 
 epoch and kills itself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-816) Detecting and diagnosing elusive bugs and faults in Zookeeper

2010-07-20 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12890436#action_12890436
 ] 

Patrick Hunt commented on ZOOKEEPER-816:


You should also consider using Avro for the marshalling/unmarshal of the 
records.
http://avro.apache.org/

Lots of benefits - in particular it's cross-language.

Re writing to disk - perhaps just re-use the ZK WAL code and write to a disk 
that's not storing the transactional log.


 Detecting and diagnosing elusive bugs and faults in Zookeeper
 -

 Key: ZOOKEEPER-816
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-816
 Project: Zookeeper
  Issue Type: New Feature
Reporter: Miguel Correia
Priority: Minor

 Complex distributed systems like Zookeeper tend to fail in strange ways that 
 are hard to diagnose. The objective is to build a tool that helps understand 
 when and where these problems occurred based on Zookeeper's traces (i.e., 
 logs in TRACE level). Minor changes to the server code will be needed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-790) Last processed zxid set prematurely while establishing leadership

2010-07-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12890451#action_12890451
 ] 

Hadoop QA commented on ZOOKEEPER-790:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12449972/ZOOKEEPER-790.patch
  against trunk revision 963957.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/150/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/150/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/150/console

This message is automatically generated.

 Last processed zxid set prematurely while establishing leadership
 -

 Key: ZOOKEEPER-790
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-790
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum
Affects Versions: 3.3.1
Reporter: Flavio Paiva Junqueira
Assignee: Flavio Paiva Junqueira
Priority: Blocker
 Fix For: 3.3.2, 3.4.0

 Attachments: ZOOKEEPER-790.patch, ZOOKEEPER-790.patch, 
 ZOOKEEPER-790.patch, ZOOKEEPER-790.travis.log.bz2


 The leader code is setting the last processed zxid to the first of the new 
 epoch even before connecting to a quorum of followers. Because the leader 
 code sets this value before connecting to a quorum of followers 
 (Leader.java:281) and the follower code throws an IOException 
 (Follower.java:73) if the leader epoch is smaller, we have that when the 
 false leader drops leadership and becomes a follower, it finds a smaller 
 epoch and kills itself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-790) Last processed zxid set prematurely while establishing leadership

2010-07-20 Thread Flavio Paiva Junqueira (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Paiva Junqueira updated ZOOKEEPER-790:
-

Attachment: ZOOKEEPER-790-3.3.patch

Uploading patch for the 3.3 branch. I have also checked the results of Hudson, 
and I couldn't find any java test failure. The -1 on core tests seems to be 
unrelated to this patch.

 Last processed zxid set prematurely while establishing leadership
 -

 Key: ZOOKEEPER-790
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-790
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum
Affects Versions: 3.3.1
Reporter: Flavio Paiva Junqueira
Assignee: Flavio Paiva Junqueira
Priority: Blocker
 Fix For: 3.3.2, 3.4.0

 Attachments: ZOOKEEPER-790-3.3.patch, ZOOKEEPER-790.patch, 
 ZOOKEEPER-790.patch, ZOOKEEPER-790.patch, ZOOKEEPER-790.travis.log.bz2


 The leader code is setting the last processed zxid to the first of the new 
 epoch even before connecting to a quorum of followers. Because the leader 
 code sets this value before connecting to a quorum of followers 
 (Leader.java:281) and the follower code throws an IOException 
 (Follower.java:73) if the leader epoch is smaller, we have that when the 
 false leader drops leadership and becomes a follower, it finds a smaller 
 epoch and kills itself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-790) Last processed zxid set prematurely while establishing leadership

2010-07-20 Thread Travis Crawford (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12890512#action_12890512
 ] 

Travis Crawford commented on ZOOKEEPER-790:
---

I tested this patch on a build with the following, applied in the listed order: 
3.3.1 release + ZOOKEEPER-744.patch + ZOOKEEPER-790-3.3.patch

Looks good!

{code}
2010-07-20 23:43:34,229 - INFO  [Thread-2545:nioserverc...@1516] - Closed 
socket connection for client /10.209.21.181:53743 (no session established for 
client)
2010-07-20 23:43:34,659 - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:nioserverc...@639] - 
EndOfStreamException: Unable to read additional data from client sessionid 
0x129d3fcb5a6f60d, likely client has closed socket
2010-07-20 23:43:34,660 - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:nioserverc...@1516] - Closed socket 
connection for client /10.209.21.204:59727 which had sessionid 0x129d3fcb5a6f60d
2010-07-20 23:43:34,684 - INFO  [ProcessThread:-1:preprequestproces...@385] - 
Processed session termination for sessionid: 0x329d3fcb6594e53
2010-07-20 23:52:14,522 - INFO  [main:quorumpeercon...@90] - Reading 
configuration from: /etc/zookeeper/conf/zoo.cfg
2010-07-20 23:52:14,529 - INFO  [main:quorumpeercon...@287] - Defaulting to 
majority quorums
2010-07-20 23:52:14,540 - INFO  [main:quorumpeerm...@119] - Starting quorum peer
2010-07-20 23:52:14,562 - INFO  [main:nioservercnxn$fact...@149] - binding to 
port 0.0.0.0/0.0.0.0:2181
2010-07-20 23:52:14,578 - INFO  [main:quorump...@818] - tickTime set to 2000
2010-07-20 23:52:14,579 - INFO  [main:quorump...@829] - minSessionTimeout set 
to -1
2010-07-20 23:52:14,579 - INFO  [main:quorump...@840] - maxSessionTimeout set 
to -1
2010-07-20 23:52:14,579 - INFO  [main:quorump...@855] - initLimit set to 10
2010-07-20 23:52:14,798 - INFO  [main:files...@82] - Reading snapshot 
/data/zookeeper/version-2/snapshot.2500197ee5
2010-07-20 23:52:15,660 - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:nioservercnxn$fact...@256] - 
Accepted socket connection from /10.209.45.76:57030
2010-07-20 23:52:15,661 - INFO  [Thread-1:quorumcnxmanager$liste...@436] - My 
election bind port: 3888
2010-07-20 23:52:15,663 - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:nioserverc...@644] - Exception 
causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not 
running
2010-07-20 23:52:15,664 - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:nioserverc...@1516] - Closed socket 
connection for client /10.209.45.76:57030 (no session established for client)
2010-07-20 23:52:15,670 - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@620] - LOOKING
2010-07-20 23:52:15,672 - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@649] - New election. My id 
=  1, Proposed zxid = 158915472832
2010-07-20 23:52:15,674 - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@689] - Notification: 1, 
158915472832, 1, 1, LOOKING, LOOKING, 1
2010-07-20 23:52:15,674 - INFO  [WorkerSender Thread:quorumcnxmana...@162] - 
Have smaller server identifier, so dropping the connection: (2, 1)
2010-07-20 23:52:15,675 - INFO  [WorkerSender Thread:quorumcnxmana...@162] - 
Have smaller server identifier, so dropping the connection: (3, 1)
2010-07-20 23:52:15,676 - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@689] - Notification: 2, 
158915472832, 5, 1, LOOKING, LOOKING, 2
2010-07-20 23:52:15,676 - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@689] - Notification: 3, 
158915472832, 5, 1, LOOKING, LOOKING, 2
2010-07-20 23:52:15,676 - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@711] - Updating proposal
2010-07-20 23:52:15,677 - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@799] - Notification: 3, 
158915472832, 5, 1, LOOKING, FOLLOWING, 2
2010-07-20 23:52:15,677 - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@689] - Notification: 3, 
158915472832, 5, 1, LOOKING, LOOKING, 3
2010-07-20 23:52:15,879 - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@642] - FOLLOWING
2010-07-20 23:52:15,885 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:lear...@72] - 
TCP NoDelay set to: true
2010-07-20 23:52:15,893 - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:environm...@97] - Server 
environment:zookeeper.version=3.3.1-942149, built on 05/07/2010 17:14 GMT
2010-07-20 23:52:15,893 - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:environm...@97] - Server 
environment:host.name=sjc1k029.twitter.com
2010-07-20 23:52:15,894 - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:environm...@97] - Server 
environment:java.version=1.6.0_16
2010-07-20 23:52:15,894 - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:environm...@97] - Server 
environment:java.vendor=Sun Microsystems Inc.
2010-07-20 23:52:15,894 - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:environm...@97] - Server 
environment:java.home=/usr/java/jdk1.6.0_16/jre
2010-07-20 23:52:15,894 - INFO  

[jira] Commented: (ZOOKEEPER-790) Last processed zxid set prematurely while establishing leadership

2010-07-20 Thread Travis Crawford (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12890513#action_12890513
 ] 

Travis Crawford commented on ZOOKEEPER-790:
---

Just to double-check, I'm really really sure the running jar was freshly built. 
For example, lsof says:

java4473 root  memREG  104,1  1012397  10092999 
/usr/local/zookeeper/zookeeper-3.3.2-dev.jar

Yes, the version number is 3.3.2, but this was built from the 3.3.1 release.

Looking in the log we see:

2010-07-20 23:52:15,893 - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:environm...@97] - Server 
environment:zookeeper.version=3.3.1-942149, built on 05/07/2010 17:14 GMT

Should that say built this afternoon? I've double  tripled checked and believe 
this is a newly built jar. Looking in the tarball we don't see 
zookeeper-3.3.2-dev.jar until after building.

 Last processed zxid set prematurely while establishing leadership
 -

 Key: ZOOKEEPER-790
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-790
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum
Affects Versions: 3.3.1
Reporter: Flavio Paiva Junqueira
Assignee: Flavio Paiva Junqueira
Priority: Blocker
 Fix For: 3.3.2, 3.4.0

 Attachments: ZOOKEEPER-790-3.3.patch, ZOOKEEPER-790.patch, 
 ZOOKEEPER-790.patch, ZOOKEEPER-790.patch, ZOOKEEPER-790.travis.log.bz2


 The leader code is setting the last processed zxid to the first of the new 
 epoch even before connecting to a quorum of followers. Because the leader 
 code sets this value before connecting to a quorum of followers 
 (Leader.java:281) and the follower code throws an IOException 
 (Follower.java:73) if the leader epoch is smaller, we have that when the 
 false leader drops leadership and becomes a follower, it finds a smaller 
 epoch and kills itself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Hudson build is back to normal : ZooKeeper-trunk #881

2010-07-20 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/881/




[jira] Commented: (ZOOKEEPER-719) Add throttling to BookKeeper client

2010-07-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12890523#action_12890523
 ] 

Hudson commented on ZOOKEEPER-719:
--

Integrated in ZooKeeper-trunk #881 (See 
[http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/881/])


 Add throttling to BookKeeper client
 ---

 Key: ZOOKEEPER-719
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-719
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bookkeeper
Affects Versions: 3.3.0
Reporter: Flavio Paiva Junqueira
Assignee: Flavio Paiva Junqueira
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-719.patch, ZOOKEEPER-719.patch, 
 ZOOKEEPER-719.patch, ZOOKEEPER-719.patch


 Add throttling to client to control the rate of operations to bookies. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-712) Bookie recovery

2010-07-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12890524#action_12890524
 ] 

Hudson commented on ZOOKEEPER-712:
--

Integrated in ZooKeeper-trunk #881 (See 
[http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/881/])


 Bookie recovery
 ---

 Key: ZOOKEEPER-712
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-712
 Project: Zookeeper
  Issue Type: New Feature
  Components: contrib-bookkeeper
Reporter: Flavio Paiva Junqueira
Assignee: Erwin Tam
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-712.patch


 Recover the ledger fragments of a bookie once it crashes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-799) Add tools and recipes for monitoring as a contrib

2010-07-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12890525#action_12890525
 ] 

Hudson commented on ZOOKEEPER-799:
--

Integrated in ZooKeeper-trunk #881 (See 
[http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/881/])


 Add tools and recipes for monitoring as a contrib
 -

 Key: ZOOKEEPER-799
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-799
 Project: Zookeeper
  Issue Type: New Feature
  Components: contrib
Reporter: Andrei Savu
Assignee: Andrei Savu
 Fix For: 3.4.0

 Attachments: monitoring.tar.gz, ZOOKEEPER-799.patch


 Tools and Recipes for Monitoring ZooKeeper using Cacti, Nagios or Ganglia. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.