[jira] Created: (ZOOKEEPER-901) Redesign of QuorumCnxManager
Redesign of QuorumCnxManager Key: ZOOKEEPER-901 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-901 Project: Zookeeper Issue Type: Improvement Components: leaderElection Affects Versions: 3.3.1 Reporter: Flavio Junqueira Assignee: Flavio Junqueira Fix For: 3.4.0 QuorumCnxManager manages TCP connections between ZooKeeper servers for leader election in replicated mode. We have identified over time a couple of deficiencies that we would like to fix. Unfortunately, fixing these issues requires a little more than just generating a couple of small patches. More specifically, I propose, based on previous discussions with the community, that we reimplement QuorumCnxManager so that we achieve the following: # Establishing connections should not be a blocking operation, and perhaps even more important, it shouldn't prevent the establishment of connections with other servers; # Using a pair of threads per connection is a little messy, and we have seen issues over time due to the creation and destruction of such threads. A more reasonable approach is to have a single thread and a selector. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Testing for Failure in the Cloud: FATE and DESTINI
research that produces real tools, which help developers find (and then fix) real failure-handling bugs, including 16 new bug reports to HDFS (7 design bugs and 9 implementation bugs). Pretty nice, given the intricacies of failure-recovery protocols. Has anyone heard of this? First time for me, he mentions some preliminary results with ZK, but I've yet to hear anything: http://databeta.wordpress.com/2010/10/15/testing-for-failure-in-the-cloud-fate-and-destini/ Patrick
Re: Restarting discussion on ZooKeeper as a TLP
Good to see we are in agreement on this. Thanks everyone who voted. Looks like this is unanimous at this point. I will start the proceedings in the Hadoop PMC to make ZooKeeper a TLP. Patrick On Thu, Oct 14, 2010 at 5:37 PM, Flavio Junqueira f...@yahoo-inc.com wrote: +1. Frankly, I don't see concretes benefits for the community with ZooKeeper becoming a TLP, but perhaps it will become clear over time. Now it is certainly cool to have our own top-level domain: http://zookeeper.apache.org/ rocks! -Flavio On Oct 14, 2010, at 1:00 PM, Benjamin Reed wrote: +1 ben On 10/14/2010 11:47 AM, Henry Robinson wrote: +1, I agree that we've addressed most outstanding concerns, we're ready for TLP. Henry On 14 October 2010 13:29, Mahadev Konarmaha...@yahoo-inc.com wrote: +1 for moving to TLP. Thanks for starting the vote Pat. mahadev On 10/13/10 2:10 PM, Patrick Huntph...@apache.org wrote: In March of this year we discussed a request from the Apache Board, and Hadoop PMC, that we become a TLP rather than a subproject of Hadoop: Original discussion http://markmail.org/thread/42cobkpzlgotcbin I originally voted against this move, my primary concern being that we were not ready to move to tlp status given our small contributor base and limited contributor diversity. However I'd now like to revisit that discussion/decision. Since that time the team has been working hard to attract new contributors, and we've seen significant new contributions come in. There has also been feedback from board/pmc addressing many of these concerns (both on the list and in private). I am now less concerned about this issue and don't see it as a blocker for us to move to TLP status. A second concern was that by becoming a TLP the project would lose it's connection with Hadoop, a big source of new users for us. I've been assured (and you can see with the other projects that have moved to tlp status; pig/hive/hbase/etc...) that this connection will be maintained. The Hadoop ZooKeeper tab for example will redirect to our new homepage. Other Apache members also pointed out to me that we are essentially operating as a TLP within the Hadoop PMC. Most of the other PMC members have little or no experience with ZooKeeper and this makes it difficult for them to monitor and advise us. By moving to TLP status we'll be able to govern ourselves and better set our direction. I believe we are ready to become a TLP. Please respond to this email with your thoughts and any issues. I will call a vote in a few days, once discussion settles. Regards, Patrick *flavio* *junqueira* research scientist f...@yahoo-inc.com direct +34 93-183-8828 avinguda diagonal 177, 8th floor, barcelona, 08018, es phone (408) 349 3300fax (408) 349 3301
[jira] Commented: (ZOOKEEPER-901) Redesign of QuorumCnxManager
[ https://issues.apache.org/jira/browse/ZOOKEEPER-901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12921921#action_12921921 ] Patrick Hunt commented on ZOOKEEPER-901: Thoughts regarding netty support? We've been adding netty support to the client-server connection mechanisms. My intent was to eventually modify the server-server connections (quorum/election) similarly. You might want to consider this when refactoring -- either adding directly or just making sure it will be easy(ier) to add netty eventually. Redesign of QuorumCnxManager Key: ZOOKEEPER-901 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-901 Project: Zookeeper Issue Type: Improvement Components: leaderElection Affects Versions: 3.3.1 Reporter: Flavio Junqueira Assignee: Flavio Junqueira Fix For: 3.4.0 QuorumCnxManager manages TCP connections between ZooKeeper servers for leader election in replicated mode. We have identified over time a couple of deficiencies that we would like to fix. Unfortunately, fixing these issues requires a little more than just generating a couple of small patches. More specifically, I propose, based on previous discussions with the community, that we reimplement QuorumCnxManager so that we achieve the following: # Establishing connections should not be a blocking operation, and perhaps even more important, it shouldn't prevent the establishment of connections with other servers; # Using a pair of threads per connection is a little messy, and we have seen issues over time due to the creation and destruction of such threads. A more reasonable approach is to have a single thread and a selector. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (ZOOKEEPER-901) Redesign of QuorumCnxManager
[ https://issues.apache.org/jira/browse/ZOOKEEPER-901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12921921#action_12921921 ] Patrick Hunt edited comment on ZOOKEEPER-901 at 10/17/10 7:21 PM: -- Thoughts regarding netty support? We've been adding netty support to the client - server connection mechanisms. My intent was to eventually modify the server - server connections (quorum/election) similarly. You might want to consider this when refactoring -- either adding directly or just making sure it will be easy(ier) to add netty eventually. was (Author: phunt): Thoughts regarding netty support? We've been adding netty support to the client-server connection mechanisms. My intent was to eventually modify the server-server connections (quorum/election) similarly. You might want to consider this when refactoring -- either adding directly or just making sure it will be easy(ier) to add netty eventually. Redesign of QuorumCnxManager Key: ZOOKEEPER-901 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-901 Project: Zookeeper Issue Type: Improvement Components: leaderElection Affects Versions: 3.3.1 Reporter: Flavio Junqueira Assignee: Flavio Junqueira Fix For: 3.4.0 QuorumCnxManager manages TCP connections between ZooKeeper servers for leader election in replicated mode. We have identified over time a couple of deficiencies that we would like to fix. Unfortunately, fixing these issues requires a little more than just generating a couple of small patches. More specifically, I propose, based on previous discussions with the community, that we reimplement QuorumCnxManager so that we achieve the following: # Establishing connections should not be a blocking operation, and perhaps even more important, it shouldn't prevent the establishment of connections with other servers; # Using a pair of threads per connection is a little messy, and we have seen issues over time due to the creation and destruction of such threads. A more reasonable approach is to have a single thread and a selector. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-804) c unit tests failing due to assertion cptr failed
[ https://issues.apache.org/jira/browse/ZOOKEEPER-804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12921923#action_12921923 ] Michi Mutsuzaki commented on ZOOKEEPER-804: --- +1. I can open a new bug and submit a patch that way if its preferred. No worry, it's not a big deal since this is a one line change. Thanks again, Jared! --Michi c unit tests failing due to assertion cptr failed --- Key: ZOOKEEPER-804 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-804 Project: Zookeeper Issue Type: Bug Components: c client Affects Versions: 3.4.0 Environment: gcc 4.4.3, ubuntu lucid lynx, dual core laptop (intel) Reporter: Patrick Hunt Assignee: Michi Mutsuzaki Priority: Critical Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-804-1.patch, ZOOKEEPER-804.patch I'm seeing this frequently: [exec] Zookeeper_simpleSystem::testPing : elapsed 18006 : OK [exec] Zookeeper_simpleSystem::testAcl : elapsed 1022 : OK [exec] Zookeeper_simpleSystem::testChroot : elapsed 3145 : OK [exec] Zookeeper_simpleSystem::testAuth ZooKeeper server started : elapsed 25687 : OK [exec] zktest-mt: /home/phunt/dev/workspace/gitzk/src/c/src/zookeeper.c:1952: zookeeper_process: Assertion `cptr' failed. [exec] make: *** [run-check] Aborted [exec] Zookeeper_simpleSystem::testHangingClient Mahadev can you take a look? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: What's the QA strategy of ZooKeeper?
Hi Vishal, thanks for the list. As you can see when we do find issues we do our best to address them and increase testing in that area. Unfortunately our testing regime, while extensive is not exhaustive. You can see the clover coverage reports here btw: https://hudson.apache.org/hudson/view/ZooKeeper/job/ZooKeeper-trunk/clover/ We'd love to see further contributions around testing. Thomas has opened some discussion around code refactoring, and I'm hopeful that will increase the coverage and enable design for test which we lack in some cases. Patrick On Fri, Oct 15, 2010 at 12:24 PM, Vishal K vishalm...@gmail.com wrote: Hi Patrick, On Fri, Oct 15, 2010 at 2:22 PM, Patrick Hunt ph...@apache.org wrote: Recently, we have ran into issues in ZK that I believe should have caught by some basic testing before the release Vishal, can you be more specific, point out specific JIRAs that you entered would be very valuable. Don't worry about hurting our feelings or anything, without this type of feedback we can't address the specific issues and their underlying problems. Heres a list of few issues: Leader election taking a long time to complete - https://issues.apache.org/jira/browse/ZOOKEEPER-822 Last processed zxid set prematurely while establishing leadership - https://issues.apache.org/jira/browse/ZOOKEEPER-790 FLE implementation should be improved to use non-blocking sockets ZOOKEEPER-900 ZK lets any node to become an observer - https://issues.apache.org/jira/browse/ZOOKEEPER-851 Regards, Patrick On Fri, Oct 15, 2010 at 11:14 AM, Mahadev Konar maha...@yahoo-inc.com wrote: Well said Vishal. I really like the points you put forth!!! Agree on all the points, but again, all the point you mention require commitment from folks like you. Its a pretty hard task to test all the corner cases of ZooKeeper. I'd expect everyone to pitch in for testing a release. We should definitely work towards a plan. You should go ahead and create a jira for the QA plan. We should all pitch in with what all should be tested. Thanks mahadev On 10/15/10 7:32 AM, Vishal K vishalm...@gmail.com wrote: Hi, I would like to add my few cents here. I would suggest to stay away from code cleanup unless it is absolutely necessary. I would also like to extend this discussion to understand the amount of testing/QA to be performed before a release. How do we currently qualify a release? Recently, we have ran into issues in ZK that I believe should have caught by some basic testing before the release. I will be honest in saying that, unfortunately, these bugs have resulted in questions being raised by several people in our organization about our choice of using ZooKeeper. Nevertheless, our product group really thinks that ZK is a cool technology, but we need to focus on making it robust before adding major new features to it. I would suggest to: 1. Look at current bugs and see why existing test did not uncover these bugs and improve those tests. 2. Look at places that need more tests and broadcast it to the community. Follow-up with test development. 3. Have a crisp release QA strategy for each release. 4. Improve API documentation as well as code documentation so that the API usage is clear and debugging is made easier. Comments? Thanks. -Vishal On Fri, Oct 15, 2010 at 9:44 AM, Thomas Koch tho...@koch.ro wrote: Hi Benjamin, thank you for your response. Please find some comments inline. Benjamin Reed: code quality is important, and there are things we should keep in mind, but in general i really don't like the idea of risking code breakage because of a gratuitous code cleanup. we should be watching out for these things when patches get submitted or when new things go in. I didn't want to say it that clear, but especially the new Netty code, both on client and server side is IMHO an example of new code in very bad shape. The client code patch even changes the FindBugs configuration to exclude the new code from the FindBugs checks. i think this is inline with what pat was saying. just to expand a bit. in my opinion clean up refactorings have the following problems: 1) you risk breaking things in production for a potential future maintenance advantage. If your code is already in such a bad shape, that every change includes considerable risk to break something, then you already are in trouble. With every new feature (or bugfix!) you also risk to break something. If you don't have the attitude of permanent refactoring to improve the code quality, you will inevitably lower the maintainability of your code with every new feature. New
[jira] Updated: (ZOOKEEPER-820) update c unit tests to ensure zombie java server processes don't cause failure
[ https://issues.apache.org/jira/browse/ZOOKEEPER-820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michi Mutsuzaki updated ZOOKEEPER-820: -- Attachment: ZOOKEEPER-820.patch Uses which to check if lsof command is present. If it is, use it to see if there is a process listening on port 22181 and kill it. --Michi update c unit tests to ensure zombie java server processes don't cause failure Key: ZOOKEEPER-820 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-820 Project: Zookeeper Issue Type: Bug Affects Versions: 3.3.1 Reporter: Patrick Hunt Assignee: Michi Mutsuzaki Priority: Critical Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-820-1.patch, ZOOKEEPER-820.patch, ZOOKEEPER-820.patch, ZOOKEEPER-820.patch When the c unit tests are run sometimes the server doesn't shutdown at the end of the test, this causes subsequent tests (hudson esp) to fail. 1) we should try harder to make the server shut down at the end of the test, I suspect this is related to test failing/cleanup 2) before the tests are run we should see if the old server is still running and try to shut it down -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-820) update c unit tests to ensure zombie java server processes don't cause failure
[ https://issues.apache.org/jira/browse/ZOOKEEPER-820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michi Mutsuzaki updated ZOOKEEPER-820: -- Status: Patch Available (was: Open) update c unit tests to ensure zombie java server processes don't cause failure Key: ZOOKEEPER-820 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-820 Project: Zookeeper Issue Type: Bug Affects Versions: 3.3.1 Reporter: Patrick Hunt Assignee: Michi Mutsuzaki Priority: Critical Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-820-1.patch, ZOOKEEPER-820.patch, ZOOKEEPER-820.patch, ZOOKEEPER-820.patch When the c unit tests are run sometimes the server doesn't shutdown at the end of the test, this causes subsequent tests (hudson esp) to fail. 1) we should try harder to make the server shut down at the end of the test, I suspect this is related to test failing/cleanup 2) before the tests are run we should see if the old server is still running and try to shut it down -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Running a single unit test
Hello, How do I run a single unit test? I tried this: $ ant test -Dtest=SessionTest but it still runs all the tests. Thanks! --Michi
[jira] Commented: (ZOOKEEPER-794) Callbacks are not invoked when the client is closed
[ https://issues.apache.org/jira/browse/ZOOKEEPER-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12921939#action_12921939 ] Michi Mutsuzaki commented on ZOOKEEPER-794: --- ZOOKEEPER-794_5.patch.txt doesn't compile. I'm getting these errors: [javac] branch-3.3/src/java/test/org/apache/zookeeper/test/SessionTest.java:201: cannot find symbol [javac] symbol : variable Assert [javac] location: class org.apache.zookeeper.test.SessionTest [javac] Assert.fail(Should have received a SessionExpiredException); [javac] ^ [javac] branch-3.3/src/java/test/org/apache/zookeeper/test/SessionTest.java:217: cannot find symbol [javac] symbol : variable Assert [javac] location: class org.apache.zookeeper.test.SessionTest [javac] Assert.assertEquals(KeeperException.Code.SESSIONEXPIRED.toString(), cb.toString()); [javac] ^ We need to either: a. import org.junit.Assert, or b. Use fail/assertEquals instead of Assert.fail/Assert.assertEquals. --Michi Callbacks are not invoked when the client is closed --- Key: ZOOKEEPER-794 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-794 Project: Zookeeper Issue Type: Bug Components: java client Affects Versions: 3.3.1 Reporter: Alexis Midon Assignee: Alexis Midon Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-794.patch.txt, ZOOKEEPER-794.txt, ZOOKEEPER-794_2.patch, ZOOKEEPER-794_3.patch, ZOOKEEPER-794_4.patch.txt, ZOOKEEPER-794_5.patch.txt I noticed that ZooKeeper has different behaviors when calling synchronous or asynchronous actions on a closed ZooKeeper client. Actually a synchronous call will throw a session expired exception while an asynchronous call will do nothing. No exception, no callback invocation. Actually, even if the EventThread receives the Packet with the session expired err code, the packet is never processed since the thread has been killed by the ventOfDeath. So the call back is not invoked. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Running a single unit test
You need to use -Dtestcase, not -Dtest, as per below: ant test -Dtestcase=YourTestHere HTH, Henry On 17 October 2010 17:34, Michi Mutsuzaki mic...@yahoo-inc.com wrote: Hello, How do I run a single unit test? I tried this: $ ant test -Dtest=SessionTest but it still runs all the tests. Thanks! --Michi -- Henry Robinson Software Engineer Cloudera 415-994-6679