[jira] Commented: (ZOOKEEPER-502) bookkeeper create calls completion too many times

2009-08-08 Thread Utkarsh Srivastava (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740846#action_12740846
 ] 

Utkarsh Srivastava commented on ZOOKEEPER-502:
--

The cause is the following: 

In Action 1 of createLedger, 4 calls to zk are fired off : one getChildren() 
and 3 create()s. The callback for the getChildren() moves the op into Action 2 
which is not correct because the 3 creates are still pending. Now the create 
callback just assumes that its already in action 2, and hence advances to 
action 3. Now the same op is enqueued twice, and its action has been set to 3. 
The double queuing explains why the callback is called twice. 


Since in this sequence of events action 2 is skipped altogether, and we end up 
with 0 bookies. If the dequeuer actually happens to dequeue before the action 
is set to 3, then action 2 will also be carried out which explains why we get 0 
bookies only sometimes and not always (which explains ZOOKEEPER-503). 


In general, this style of asynchronous programming with stage numbers is 
error-prone, and hard to read. Object creation is cheap, and operations like 
openLedger and createLedger are rare. Why not just create anonymous inner 
classes as callbacks instead of doing this state machine? 

 bookkeeper create calls completion too many times
 -

 Key: ZOOKEEPER-502
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-502
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bookkeeper
Reporter: Benjamin Reed
Assignee: Flavio Paiva Junqueira
 Attachments: ZOOKEEPER-502.patch


 when calling the asynchronous version of create, the completion routine is 
 called more than once.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration

2009-08-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740848#action_12740848
 ] 

Hadoop QA commented on ZOOKEEPER-498:
-

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12415916/ZOOKEEPER-498.patch
  against trunk revision 802188.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/180/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/180/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/180/console

This message is automatically generated.

 Unending Leader Elections : WAN configuration
 -

 Key: ZOOKEEPER-498
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498
 Project: Zookeeper
  Issue Type: Bug
  Components: leaderElection
Affects Versions: 3.2.0
 Environment: Each machine:
 CentOS 5.2 64-bit
 2GB ram
 java version 1.6.0_13
 Java(TM) SE Runtime Environment (build 1.6.0_13-b03)
 Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed 
 Network Topology:
 DC : central data center
 POD(N): remote data center
 Zookeeper Topology:
 Leaders may be elected only in DC (weight = 1)
 Only followers are elected in PODS (weight = 0)
Reporter: Todd Greenwood-Geer
Assignee: Flavio Paiva Junqueira
Priority: Critical
 Fix For: 3.2.1, 3.3.0

 Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, 
 zk498-test.tar.gz, zoo.cfg, ZOOKEEPER-498.patch, ZOOKEEPER-498.patch, 
 ZOOKEEPER-498.patch, ZOOKEEPER-498.patch


 In a WAN configuration, ZooKeeper is endlessly electing, terminating, and 
 re-electing a ZooKeeper leader. The WAN configuration involves two groups, a 
 central DC group of ZK servers that have a voting weight = 1, and a group of 
 servers in remote pods with a voting weight of 0.
 What we expect to see is leaders elected only in the DC, and the pods to 
 contain only followers. What we are seeing is a continuous cycling of 
 leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended 
 patches (473, 479, 481, 491), and now release 3.2.1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-490) the java docs for session creation are misleading/incomplete

2009-08-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740890#action_12740890
 ] 

Hudson commented on ZOOKEEPER-490:
--

Integrated in ZooKeeper-trunk #409 (See 
[http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/409/])
. the java docs for session creation are misleading/incomplete


 the java docs for session creation are misleading/incomplete
 

 Key: ZOOKEEPER-490
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-490
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.1.1, 3.2.0
Reporter: Patrick Hunt
Assignee: Patrick Hunt
 Fix For: 3.2.1, 3.3.0

 Attachments: ZOOKEEPER-490.patch


 the javadoc for ZooKeeper constructor says:
  * The client object will pick an arbitrary server and try to connect to 
 it.
  * If failed, it will try the next one in the list, until a connection is
  * established, or all the servers have been tried.
 the or all server tried phrase is misleading, it should indicate that we 
 retry until success, con closed, or session expired. 
 we also need ot mention that connection is async, that constructor returns 
 immed and you need to look for connection event in watcher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-501) CnxManagerTest failed on hudson

2009-08-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740891#action_12740891
 ] 

Hudson commented on ZOOKEEPER-501:
--

Integrated in ZooKeeper-trunk #409 (See 
[http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/409/])
. CnxManagerTest failed on hudson. (flavio via mahadev)


 CnxManagerTest failed on hudson
 ---

 Key: ZOOKEEPER-501
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-501
 Project: Zookeeper
  Issue Type: Bug
Reporter: Flavio Paiva Junqueira
Assignee: Flavio Paiva Junqueira
 Fix For: 3.2.1, 3.3.0

 Attachments: CnxnManagerTest.log, ZOOKEEPER-501.patch


 It timed out according to the console output:
 http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/406/testReport/org.apache.zookeeper.test/CnxManagerTest/testCnxManager/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (ZOOKEEPER-505) testAsyncCreateClose is badly broken

2009-08-08 Thread Utkarsh Srivastava (JIRA)
testAsyncCreateClose is badly broken


 Key: ZOOKEEPER-505
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-505
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bookkeeper
Reporter: Utkarsh Srivastava
Priority: Critical


The test case testAsyncCreateClose is badly broken. I was wondering why all the 
unit tests are passing inspite of having found so many different problems with 
LedgerManagementProcessor. 

There is a big try-catch block sitting in the test case that catches all 
exception, prints their stack trace, and exits, thereby allowing the test to 
pass. In general, unit tests shouldnt catch exceptions unless it is something 
you are expecting that will happen.

Another problem is that the same ControlObject is used for synchronization 
throughout. Since we already have the problem of callbacks being called 
multiple times (ZOOKEEPER-502), notify() on the control object is called too 
many times, resulting in the unit test not waiting for certain callbacks.

Thus the test never waits for the asyncOpenLedger() to finish, and hence still 
succeeds. I believe asyncOpenLedger() has never worked right. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-505) testAsyncCreateClose is badly broken

2009-08-08 Thread Utkarsh Srivastava (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Utkarsh Srivastava updated ZOOKEEPER-505:
-

Attachment: ZOOKEEPER-505.1.patch

Fix the unit test to not catch exceptions
Actually wait for openCallback to be called by using a new control object

 testAsyncCreateClose is badly broken
 

 Key: ZOOKEEPER-505
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-505
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bookkeeper
Reporter: Utkarsh Srivastava
Priority: Critical
 Attachments: ZOOKEEPER-505.1.patch


 The test case testAsyncCreateClose is badly broken. I was wondering why all 
 the unit tests are passing inspite of having found so many different problems 
 with LedgerManagementProcessor. 
 There is a big try-catch block sitting in the test case that catches all 
 exception, prints their stack trace, and exits, thereby allowing the test to 
 pass. In general, unit tests shouldnt catch exceptions unless it is something 
 you are expecting that will happen.
 Another problem is that the same ControlObject is used for synchronization 
 throughout. Since we already have the problem of callbacks being called 
 multiple times (ZOOKEEPER-502), notify() on the control object is called too 
 many times, resulting in the unit test not waiting for certain callbacks.
 Thus the test never waits for the asyncOpenLedger() to finish, and hence 
 still succeeds. I believe asyncOpenLedger() has never worked right. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-505) testAsyncCreateClose is badly broken

2009-08-08 Thread Utkarsh Srivastava (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Utkarsh Srivastava updated ZOOKEEPER-505:
-

Status: Patch Available  (was: Open)

 testAsyncCreateClose is badly broken
 

 Key: ZOOKEEPER-505
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-505
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bookkeeper
Reporter: Utkarsh Srivastava
Priority: Critical
 Attachments: ZOOKEEPER-505.1.patch


 The test case testAsyncCreateClose is badly broken. I was wondering why all 
 the unit tests are passing inspite of having found so many different problems 
 with LedgerManagementProcessor. 
 There is a big try-catch block sitting in the test case that catches all 
 exception, prints their stack trace, and exits, thereby allowing the test to 
 pass. In general, unit tests shouldnt catch exceptions unless it is something 
 you are expecting that will happen.
 Another problem is that the same ControlObject is used for synchronization 
 throughout. Since we already have the problem of callbacks being called 
 multiple times (ZOOKEEPER-502), notify() on the control object is called too 
 many times, resulting in the unit test not waiting for certain callbacks.
 Thus the test never waits for the asyncOpenLedger() to finish, and hence 
 still succeeds. I believe asyncOpenLedger() has never worked right. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-499) electionAlg should default to FLE (3) - regression

2009-08-08 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-499:


Status: Open  (was: Patch Available)

this looks good pat, but when you first get the logger, why are you using the 
package name? if you are going to use the package name shouldn't you get the 
package from the class file?

in the second test, you get the logger using a package to add an appender, but 
remove using the class. couldn't that cause a problem potentially?

 electionAlg should default to FLE (3) - regression
 --

 Key: ZOOKEEPER-499
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-499
 Project: Zookeeper
  Issue Type: Bug
  Components: server, tests
Affects Versions: 3.2.0
Reporter: Patrick Hunt
Assignee: Patrick Hunt
Priority: Blocker
 Fix For: 3.2.1, 3.3.0

 Attachments: ZOOKEEPER-499.patch, ZOOKEEPER-499_br3.2.patch


 there's a regression in 3.2 - electionAlg is no longer defaulting to 3 
 (incorrectly defaults to 0)
 also - need to have tests to validate this

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-505) testAsyncCreateClose is badly broken

2009-08-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12741000#action_12741000
 ] 

Hadoop QA commented on ZOOKEEPER-505:
-

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12415968/ZOOKEEPER-505.1.patch
  against trunk revision 802188.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/181/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/181/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/181/console

This message is automatically generated.

 testAsyncCreateClose is badly broken
 

 Key: ZOOKEEPER-505
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-505
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bookkeeper
Reporter: Utkarsh Srivastava
Priority: Critical
 Attachments: ZOOKEEPER-505.1.patch


 The test case testAsyncCreateClose is badly broken. I was wondering why all 
 the unit tests are passing inspite of having found so many different problems 
 with LedgerManagementProcessor. 
 There is a big try-catch block sitting in the test case that catches all 
 exception, prints their stack trace, and exits, thereby allowing the test to 
 pass. In general, unit tests shouldnt catch exceptions unless it is something 
 you are expecting that will happen.
 Another problem is that the same ControlObject is used for synchronization 
 throughout. Since we already have the problem of callbacks being called 
 multiple times (ZOOKEEPER-502), notify() on the control object is called too 
 many times, resulting in the unit test not waiting for certain callbacks.
 Thus the test never waits for the asyncOpenLedger() to finish, and hence 
 still succeeds. I believe asyncOpenLedger() has never worked right. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-483) ZK fataled on me, and ugly

2009-08-08 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-483:


Attachment: ZOOKEEPER-483.patch

fixed patch to apply cleanly.

 ZK fataled on me, and ugly
 --

 Key: ZOOKEEPER-483
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-483
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.1.1
Reporter: ryan rawson
Assignee: Benjamin Reed
 Fix For: 3.2.1, 3.3.0

 Attachments: zklogs.tar.gz, ZOOKEEPER-483.patch, ZOOKEEPER-483.patch, 
 ZOOKEEPER-483.patch


 here are the part of the log whereby my zookeeper instance crashed, taking 3 
 out of 5 down, and thus ruining the quorum for all clients:
 2009-07-23 12:29:06,769 WARN org.apache.zookeeper.server.NIOServerCnxn: 
 Exception causing close of session 0x52276d1d5161350 due to 
 java.io.IOException: Read error
 2009-07-23 12:29:00,756 WARN org.apache.zookeeper.server.quorum.Follower: 
 Exception when following the leader
 java.io.EOFException
 at java.io.DataInputStream.readInt(DataInputStream.java:375)
 at 
 org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
 at 
 org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:65)
 at 
 org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
 at 
 org.apache.zookeeper.server.quorum.Follower.readPacket(Follower.java:114)
 at 
 org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:243)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:494)
 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x52276d1d5161350 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.168:39489]
 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb0578 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.159:46797]
 2009-07-23 12:29:06,771 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa013e NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.153:33998]
 2009-07-23 12:29:06,771 WARN org.apache.zookeeper.server.NIOServerCnxn: 
 Exception causing close of session 0x52276d1d5160593 due to 
 java.io.IOException: Read error
 2009-07-23 12:29:06,808 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e02bb NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.158:53758]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa13e4 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.154:58681]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x22276d15e691382 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.162:59967]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb1354 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.163:49957]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa13cd NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.150:34212]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x22276d15e691383 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.159:46813]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb0350 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.162:59956]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e139b NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.156:55138]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e1398 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.167:41257]
 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x52276d1d5161355 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.153:34032]
 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x52276d1d516011c NIOServerCnxn: