[jira] Commented: (ZOOKEEPER-502) bookkeeper create calls completion too many times
[ https://issues.apache.org/jira/browse/ZOOKEEPER-502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740846#action_12740846 ] Utkarsh Srivastava commented on ZOOKEEPER-502: -- The cause is the following: In Action 1 of createLedger, 4 calls to zk are fired off : one getChildren() and 3 create()s. The callback for the getChildren() moves the op into Action 2 which is not correct because the 3 creates are still pending. Now the create callback just assumes that its already in action 2, and hence advances to action 3. Now the same op is enqueued twice, and its action has been set to 3. The double queuing explains why the callback is called twice. Since in this sequence of events action 2 is skipped altogether, and we end up with 0 bookies. If the dequeuer actually happens to dequeue before the action is set to 3, then action 2 will also be carried out which explains why we get 0 bookies only sometimes and not always (which explains ZOOKEEPER-503). In general, this style of asynchronous programming with stage numbers is error-prone, and hard to read. Object creation is cheap, and operations like openLedger and createLedger are rare. Why not just create anonymous inner classes as callbacks instead of doing this state machine? bookkeeper create calls completion too many times - Key: ZOOKEEPER-502 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-502 Project: Zookeeper Issue Type: Bug Components: contrib-bookkeeper Reporter: Benjamin Reed Assignee: Flavio Paiva Junqueira Attachments: ZOOKEEPER-502.patch when calling the asynchronous version of create, the completion routine is called more than once. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration
[ https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740848#action_12740848 ] Hadoop QA commented on ZOOKEEPER-498: - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12415916/ZOOKEEPER-498.patch against trunk revision 802188. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/180/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/180/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/180/console This message is automatically generated. Unending Leader Elections : WAN configuration - Key: ZOOKEEPER-498 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498 Project: Zookeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.2.0 Environment: Each machine: CentOS 5.2 64-bit 2GB ram java version 1.6.0_13 Java(TM) SE Runtime Environment (build 1.6.0_13-b03) Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed Network Topology: DC : central data center POD(N): remote data center Zookeeper Topology: Leaders may be elected only in DC (weight = 1) Only followers are elected in PODS (weight = 0) Reporter: Todd Greenwood-Geer Assignee: Flavio Paiva Junqueira Priority: Critical Fix For: 3.2.1, 3.3.0 Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, zk498-test.tar.gz, zoo.cfg, ZOOKEEPER-498.patch, ZOOKEEPER-498.patch, ZOOKEEPER-498.patch, ZOOKEEPER-498.patch In a WAN configuration, ZooKeeper is endlessly electing, terminating, and re-electing a ZooKeeper leader. The WAN configuration involves two groups, a central DC group of ZK servers that have a voting weight = 1, and a group of servers in remote pods with a voting weight of 0. What we expect to see is leaders elected only in the DC, and the pods to contain only followers. What we are seeing is a continuous cycling of leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended patches (473, 479, 481, 491), and now release 3.2.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-490) the java docs for session creation are misleading/incomplete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740890#action_12740890 ] Hudson commented on ZOOKEEPER-490: -- Integrated in ZooKeeper-trunk #409 (See [http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/409/]) . the java docs for session creation are misleading/incomplete the java docs for session creation are misleading/incomplete Key: ZOOKEEPER-490 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-490 Project: Zookeeper Issue Type: Bug Affects Versions: 3.1.1, 3.2.0 Reporter: Patrick Hunt Assignee: Patrick Hunt Fix For: 3.2.1, 3.3.0 Attachments: ZOOKEEPER-490.patch the javadoc for ZooKeeper constructor says: * The client object will pick an arbitrary server and try to connect to it. * If failed, it will try the next one in the list, until a connection is * established, or all the servers have been tried. the or all server tried phrase is misleading, it should indicate that we retry until success, con closed, or session expired. we also need ot mention that connection is async, that constructor returns immed and you need to look for connection event in watcher -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-501) CnxManagerTest failed on hudson
[ https://issues.apache.org/jira/browse/ZOOKEEPER-501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740891#action_12740891 ] Hudson commented on ZOOKEEPER-501: -- Integrated in ZooKeeper-trunk #409 (See [http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/409/]) . CnxManagerTest failed on hudson. (flavio via mahadev) CnxManagerTest failed on hudson --- Key: ZOOKEEPER-501 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-501 Project: Zookeeper Issue Type: Bug Reporter: Flavio Paiva Junqueira Assignee: Flavio Paiva Junqueira Fix For: 3.2.1, 3.3.0 Attachments: CnxnManagerTest.log, ZOOKEEPER-501.patch It timed out according to the console output: http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/406/testReport/org.apache.zookeeper.test/CnxManagerTest/testCnxManager/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (ZOOKEEPER-505) testAsyncCreateClose is badly broken
testAsyncCreateClose is badly broken Key: ZOOKEEPER-505 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-505 Project: Zookeeper Issue Type: Bug Components: contrib-bookkeeper Reporter: Utkarsh Srivastava Priority: Critical The test case testAsyncCreateClose is badly broken. I was wondering why all the unit tests are passing inspite of having found so many different problems with LedgerManagementProcessor. There is a big try-catch block sitting in the test case that catches all exception, prints their stack trace, and exits, thereby allowing the test to pass. In general, unit tests shouldnt catch exceptions unless it is something you are expecting that will happen. Another problem is that the same ControlObject is used for synchronization throughout. Since we already have the problem of callbacks being called multiple times (ZOOKEEPER-502), notify() on the control object is called too many times, resulting in the unit test not waiting for certain callbacks. Thus the test never waits for the asyncOpenLedger() to finish, and hence still succeeds. I believe asyncOpenLedger() has never worked right. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-505) testAsyncCreateClose is badly broken
[ https://issues.apache.org/jira/browse/ZOOKEEPER-505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Utkarsh Srivastava updated ZOOKEEPER-505: - Attachment: ZOOKEEPER-505.1.patch Fix the unit test to not catch exceptions Actually wait for openCallback to be called by using a new control object testAsyncCreateClose is badly broken Key: ZOOKEEPER-505 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-505 Project: Zookeeper Issue Type: Bug Components: contrib-bookkeeper Reporter: Utkarsh Srivastava Priority: Critical Attachments: ZOOKEEPER-505.1.patch The test case testAsyncCreateClose is badly broken. I was wondering why all the unit tests are passing inspite of having found so many different problems with LedgerManagementProcessor. There is a big try-catch block sitting in the test case that catches all exception, prints their stack trace, and exits, thereby allowing the test to pass. In general, unit tests shouldnt catch exceptions unless it is something you are expecting that will happen. Another problem is that the same ControlObject is used for synchronization throughout. Since we already have the problem of callbacks being called multiple times (ZOOKEEPER-502), notify() on the control object is called too many times, resulting in the unit test not waiting for certain callbacks. Thus the test never waits for the asyncOpenLedger() to finish, and hence still succeeds. I believe asyncOpenLedger() has never worked right. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-505) testAsyncCreateClose is badly broken
[ https://issues.apache.org/jira/browse/ZOOKEEPER-505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Utkarsh Srivastava updated ZOOKEEPER-505: - Status: Patch Available (was: Open) testAsyncCreateClose is badly broken Key: ZOOKEEPER-505 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-505 Project: Zookeeper Issue Type: Bug Components: contrib-bookkeeper Reporter: Utkarsh Srivastava Priority: Critical Attachments: ZOOKEEPER-505.1.patch The test case testAsyncCreateClose is badly broken. I was wondering why all the unit tests are passing inspite of having found so many different problems with LedgerManagementProcessor. There is a big try-catch block sitting in the test case that catches all exception, prints their stack trace, and exits, thereby allowing the test to pass. In general, unit tests shouldnt catch exceptions unless it is something you are expecting that will happen. Another problem is that the same ControlObject is used for synchronization throughout. Since we already have the problem of callbacks being called multiple times (ZOOKEEPER-502), notify() on the control object is called too many times, resulting in the unit test not waiting for certain callbacks. Thus the test never waits for the asyncOpenLedger() to finish, and hence still succeeds. I believe asyncOpenLedger() has never worked right. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-499) electionAlg should default to FLE (3) - regression
[ https://issues.apache.org/jira/browse/ZOOKEEPER-499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-499: Status: Open (was: Patch Available) this looks good pat, but when you first get the logger, why are you using the package name? if you are going to use the package name shouldn't you get the package from the class file? in the second test, you get the logger using a package to add an appender, but remove using the class. couldn't that cause a problem potentially? electionAlg should default to FLE (3) - regression -- Key: ZOOKEEPER-499 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-499 Project: Zookeeper Issue Type: Bug Components: server, tests Affects Versions: 3.2.0 Reporter: Patrick Hunt Assignee: Patrick Hunt Priority: Blocker Fix For: 3.2.1, 3.3.0 Attachments: ZOOKEEPER-499.patch, ZOOKEEPER-499_br3.2.patch there's a regression in 3.2 - electionAlg is no longer defaulting to 3 (incorrectly defaults to 0) also - need to have tests to validate this -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-505) testAsyncCreateClose is badly broken
[ https://issues.apache.org/jira/browse/ZOOKEEPER-505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12741000#action_12741000 ] Hadoop QA commented on ZOOKEEPER-505: - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12415968/ZOOKEEPER-505.1.patch against trunk revision 802188. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/181/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/181/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/181/console This message is automatically generated. testAsyncCreateClose is badly broken Key: ZOOKEEPER-505 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-505 Project: Zookeeper Issue Type: Bug Components: contrib-bookkeeper Reporter: Utkarsh Srivastava Priority: Critical Attachments: ZOOKEEPER-505.1.patch The test case testAsyncCreateClose is badly broken. I was wondering why all the unit tests are passing inspite of having found so many different problems with LedgerManagementProcessor. There is a big try-catch block sitting in the test case that catches all exception, prints their stack trace, and exits, thereby allowing the test to pass. In general, unit tests shouldnt catch exceptions unless it is something you are expecting that will happen. Another problem is that the same ControlObject is used for synchronization throughout. Since we already have the problem of callbacks being called multiple times (ZOOKEEPER-502), notify() on the control object is called too many times, resulting in the unit test not waiting for certain callbacks. Thus the test never waits for the asyncOpenLedger() to finish, and hence still succeeds. I believe asyncOpenLedger() has never worked right. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-483) ZK fataled on me, and ugly
[ https://issues.apache.org/jira/browse/ZOOKEEPER-483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-483: Attachment: ZOOKEEPER-483.patch fixed patch to apply cleanly. ZK fataled on me, and ugly -- Key: ZOOKEEPER-483 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-483 Project: Zookeeper Issue Type: Bug Affects Versions: 3.1.1 Reporter: ryan rawson Assignee: Benjamin Reed Fix For: 3.2.1, 3.3.0 Attachments: zklogs.tar.gz, ZOOKEEPER-483.patch, ZOOKEEPER-483.patch, ZOOKEEPER-483.patch here are the part of the log whereby my zookeeper instance crashed, taking 3 out of 5 down, and thus ruining the quorum for all clients: 2009-07-23 12:29:06,769 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x52276d1d5161350 due to java.io.IOException: Read error 2009-07-23 12:29:00,756 WARN org.apache.zookeeper.server.quorum.Follower: Exception when following the leader java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:65) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) at org.apache.zookeeper.server.quorum.Follower.readPacket(Follower.java:114) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:243) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:494) 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x52276d1d5161350 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.168:39489] 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb0578 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.159:46797] 2009-07-23 12:29:06,771 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa013e NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.153:33998] 2009-07-23 12:29:06,771 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x52276d1d5160593 due to java.io.IOException: Read error 2009-07-23 12:29:06,808 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e02bb NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.158:53758] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa13e4 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.154:58681] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x22276d15e691382 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.162:59967] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb1354 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.163:49957] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa13cd NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.150:34212] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x22276d15e691383 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.159:46813] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb0350 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.162:59956] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e139b NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.156:55138] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e1398 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.167:41257] 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x52276d1d5161355 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.153:34032] 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x52276d1d516011c NIOServerCnxn: