[jira] Commented: (ZOOKEEPER-864) Hedwig C++ client improvements
[ https://issues.apache.org/jira/browse/ZOOKEEPER-864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12908666#action_12908666 ] Ivan Kelly commented on ZOOKEEPER-864: -- Yes, you are absolutely right. Unless we don't install protocol.h at all and try to remove it from the hedwig public headers. I remember the protocol.h stuff was causing you problems recently so this could resolve that issue too. Hedwig C++ client improvements -- Key: ZOOKEEPER-864 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-864 Project: Zookeeper Issue Type: Improvement Reporter: Ivan Kelly Assignee: Ivan Kelly Fix For: 3.4.0 Attachments: warnings.txt, ZOOKEEPER-864.diff, ZOOKEEPER-864.diff, ZOOKEEPER-864.diff I changed the socket code to use boost asio. Now the client only creates one thread, and all operations are non-blocking. Tests are now automated, just run make check. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Build failed in Hudson: ZooKeeper-trunk #934
See https://hudson.apache.org/hudson/job/ZooKeeper-trunk/934/ -- [...truncated 168605 lines...] [junit] 2010-09-13 10:52:44,182 [myid:] - INFO [main:quorumb...@195] - 127.0.0.1:11237 is accepting client connections [junit] 2010-09-13 10:52:44,182 [myid:] - INFO [main:clientb...@225] - connecting to 127.0.0.1 11238 [junit] 2010-09-13 10:52:44,183 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11238:nioservercnxnfact...@196] - Accepted socket connection from /127.0.0.1:44910 [junit] 2010-09-13 10:52:44,183 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11238:nioserverc...@791] - Processing stat command from /127.0.0.1:44910 [junit] 2010-09-13 10:52:44,183 [myid:] - INFO [Thread-359:nioservercnxn$statcomm...@645] - Stat command output [junit] 2010-09-13 10:52:44,184 [myid:] - INFO [Thread-359:nioserverc...@967] - Closed socket connection for client /127.0.0.1:44910 (no session established for client) [junit] 2010-09-13 10:52:44,184 [myid:] - INFO [main:quorumb...@195] - 127.0.0.1:11238 is accepting client connections [junit] 2010-09-13 10:52:44,184 [myid:] - INFO [main:clientb...@225] - connecting to 127.0.0.1 11239 [junit] 2010-09-13 10:52:44,185 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11239:nioservercnxnfact...@196] - Accepted socket connection from /127.0.0.1:46017 [junit] 2010-09-13 10:52:44,185 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11239:nioserverc...@791] - Processing stat command from /127.0.0.1:46017 [junit] 2010-09-13 10:52:44,201 [myid:] - INFO [Thread-360:nioserverc...@967] - Closed socket connection for client /127.0.0.1:46017 (no session established for client) [junit] 2010-09-13 10:52:44,452 [myid:] - INFO [main:clientb...@225] - connecting to 127.0.0.1 11239 [junit] 2010-09-13 10:52:44,452 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11239:nioservercnxnfact...@196] - Accepted socket connection from /127.0.0.1:46018 [junit] 2010-09-13 10:52:44,452 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11239:nioserverc...@791] - Processing stat command from /127.0.0.1:46018 [junit] 2010-09-13 10:52:44,453 [myid:] - INFO [Thread-361:nioserverc...@967] - Closed socket connection for client /127.0.0.1:46018 (no session established for client) [junit] 2010-09-13 10:52:44,703 [myid:] - INFO [main:clientb...@225] - connecting to 127.0.0.1 11239 [junit] 2010-09-13 10:52:44,703 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11239:nioservercnxnfact...@196] - Accepted socket connection from /127.0.0.1:46019 [junit] 2010-09-13 10:52:44,704 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11239:nioserverc...@791] - Processing stat command from /127.0.0.1:46019 [junit] 2010-09-13 10:52:44,704 [myid:] - INFO [Thread-362:nioserverc...@967] - Closed socket connection for client /127.0.0.1:46019 (no session established for client) [junit] 2010-09-13 10:52:44,954 [myid:] - INFO [main:clientb...@225] - connecting to 127.0.0.1 11239 [junit] 2010-09-13 10:52:44,955 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11239:nioservercnxnfact...@196] - Accepted socket connection from /127.0.0.1:46020 [junit] 2010-09-13 10:52:44,955 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11239:nioserverc...@791] - Processing stat command from /127.0.0.1:46020 [junit] 2010-09-13 10:52:44,955 [myid:] - INFO [Thread-363:nioserverc...@967] - Closed socket connection for client /127.0.0.1:46020 (no session established for client) [junit] 2010-09-13 10:52:45,206 [myid:] - INFO [main:clientb...@225] - connecting to 127.0.0.1 11239 [junit] 2010-09-13 10:52:45,206 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11239:nioservercnxnfact...@196] - Accepted socket connection from /127.0.0.1:46021 [junit] 2010-09-13 10:52:45,206 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11239:nioserverc...@791] - Processing stat command from /127.0.0.1:46021 [junit] 2010-09-13 10:52:45,207 [myid:] - INFO [Thread-364:nioserverc...@967] - Closed socket connection for client /127.0.0.1:46021 (no session established for client) [junit] 2010-09-13 10:52:45,457 [myid:] - INFO [main:clientb...@225] - connecting to 127.0.0.1 11239 [junit] 2010-09-13 10:52:45,457 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11239:nioservercnxnfact...@196] - Accepted socket connection from /127.0.0.1:46022 [junit] 2010-09-13 10:52:45,457 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11239:nioserverc...@791] - Processing stat command from /127.0.0.1:46022 [junit] 2010-09-13 10:52:45,458 [myid:] - INFO [Thread-365:nioservercnxn$statcomm...@645] - Stat command output [junit] 2010-09-13 10:52:45,458 [myid:] - INFO [Thread-365:nioserverc...@967] - Closed socket connection for client /127.0.0.1:46022 (no session established for client) [junit]
[jira] Commented: (ZOOKEEPER-831) BookKeeper: Throttling improved for reads
[ https://issues.apache.org/jira/browse/ZOOKEEPER-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12908724#action_12908724 ] Ivan Kelly commented on ZOOKEEPER-831: -- submitOrdered can throw RejectedExecutionException (Im guessing rare to the order of if this happens the machine will die soon) or NullPointerException (unlikely). However there's no harm putting a try { } catch (Exception e) { opCounterSem.release(); } around it. the handler will never run except if the job is never submitted. Also, you need a release inside the if (metadata.isClosed) . Otherwise I think it should be fine. As PendingAddOp should be able to handle it no matter what error occurs. BookKeeper: Throttling improved for reads - Key: ZOOKEEPER-831 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-831 Project: Zookeeper Issue Type: Bug Components: contrib-bookkeeper Affects Versions: 3.3.1 Reporter: Flavio Junqueira Assignee: Flavio Junqueira Fix For: 3.4.0 Attachments: ZOOKEEPER-831.patch, ZOOKEEPER-831.patch, ZOOKEEPER-831.patch Reads and writes in BookKeeper are asymmetric: a write request writes one entry, whereas a read request may read multiple requests. The current implementation of throttling only counts the number of read requests instead of counting the number of entries being read. Consequently, a few read requests reading a large number of entries each will spawn a large number of read-entry requests. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model
[ https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abmar Barros updated ZOOKEEPER-702: --- Attachment: ZOOKEEPER-702.patch In this patch: * Added support for declaring new failure detectors, as Ivan suggested. * Updated documentation to reflect this. * Replaced all 'heartbeat' mentions in the API and tests for 'ping'. GSoC 2010: Failure Detector Model - Key: ZOOKEEPER-702 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702 Project: Zookeeper Issue Type: Wish Reporter: Henry Robinson Assignee: Abmar Barros Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, ZOOKEEPER-702-code.patch, ZOOKEEPER-702-doc.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch Failure Detector Module Possible Mentor Henry Robinson (henry at apache dot org) Requirements Java, some distributed systems knowledge, comfort implementing distributed systems protocols Description ZooKeeper servers detects the failure of other servers and clients by counting the number of 'ticks' for which it doesn't get a heartbeat from other machines. This is the 'timeout' method of failure detection and works very well; however it is possible that it is too aggressive and not easily tuned for some more unusual ZooKeeper installations (such as in a wide-area network, or even in a mobile ad-hoc network). This project would abstract the notion of failure detection to a dedicated Java module, and implement several failure detectors to compare and contrast their appropriateness for ZooKeeper. For example, Apache Cassandra uses a phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which is much more tunable and has some very interesting properties. This is a great project if you are interested in distributed algorithms, or want to help re-factor some of ZooKeeper's internal code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model
[ https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12908817#action_12908817 ] Ivan Kelly commented on ZOOKEEPER-702: -- Code looks good Ambar, good job :) I think the factory code could be improved slightly though. Currently if you want a custom FD you pass in a FDname and an FDClass. I would merge these two, so that you only pass in FDname to the factory, and the factory can decide whether this is a builtin or a custom. So as an example, FDname can be fixed, chen, phiaccrual or com.blah.foobar.MyFailureDetector. When the factory sees this it goes... if (fdName.equals(fixed)) { fdClass = FixedPingFailureDetector.class; } else if (fdName.equals(chen)) { fdClass = ChenFailureDetector.class; } else if (fdName.equals(bertier)) { fdClass = BertierFailureDetector.class; } else if (fdName.equals(phiaccrual)) { fdClass = PhiAccrualFailureDetector.class; } else if (fdName.equals(sliced)) { fdClass = SlicedPingFailureDetector.class; } else { fdClass = Class.forName(fdName); } GSoC 2010: Failure Detector Model - Key: ZOOKEEPER-702 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702 Project: Zookeeper Issue Type: Wish Reporter: Henry Robinson Assignee: Abmar Barros Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, ZOOKEEPER-702-code.patch, ZOOKEEPER-702-doc.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch Failure Detector Module Possible Mentor Henry Robinson (henry at apache dot org) Requirements Java, some distributed systems knowledge, comfort implementing distributed systems protocols Description ZooKeeper servers detects the failure of other servers and clients by counting the number of 'ticks' for which it doesn't get a heartbeat from other machines. This is the 'timeout' method of failure detection and works very well; however it is possible that it is too aggressive and not easily tuned for some more unusual ZooKeeper installations (such as in a wide-area network, or even in a mobile ad-hoc network). This project would abstract the notion of failure detection to a dedicated Java module, and implement several failure detectors to compare and contrast their appropriateness for ZooKeeper. For example, Apache Cassandra uses a phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which is much more tunable and has some very interesting properties. This is a great project if you are interested in distributed algorithms, or want to help re-factor some of ZooKeeper's internal code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model
[ https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12908876#action_12908876 ] Abmar Barros commented on ZOOKEEPER-702: Hi Ivan, thanks for the feedback. I thought the idea was to declare an alias to a custom failure detector, so setting its options would be more user friendly. e.g.: sessionsFD = myfd sessionsFD.myfd = com.softwaresoft.zk.myFailureDetector sessionsFD.myfd.blahblah = foobar sessionsFD.myfd.myoption = barfoo In this case, I need both className and fdName parameters in the factory code (and this is how it is implemented in the patch). What do you think? GSoC 2010: Failure Detector Model - Key: ZOOKEEPER-702 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702 Project: Zookeeper Issue Type: Wish Reporter: Henry Robinson Assignee: Abmar Barros Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, ZOOKEEPER-702-code.patch, ZOOKEEPER-702-doc.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch Failure Detector Module Possible Mentor Henry Robinson (henry at apache dot org) Requirements Java, some distributed systems knowledge, comfort implementing distributed systems protocols Description ZooKeeper servers detects the failure of other servers and clients by counting the number of 'ticks' for which it doesn't get a heartbeat from other machines. This is the 'timeout' method of failure detection and works very well; however it is possible that it is too aggressive and not easily tuned for some more unusual ZooKeeper installations (such as in a wide-area network, or even in a mobile ad-hoc network). This project would abstract the notion of failure detection to a dedicated Java module, and implement several failure detectors to compare and contrast their appropriateness for ZooKeeper. For example, Apache Cassandra uses a phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which is much more tunable and has some very interesting properties. This is a great project if you are interested in distributed algorithms, or want to help re-factor some of ZooKeeper's internal code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-804) c unit tests failing due to assertion cptr failed
[ https://issues.apache.org/jira/browse/ZOOKEEPER-804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michi Mutsuzaki updated ZOOKEEPER-804: -- Attachment: ZOOKEEPER-804.patch This patch modifies zookeeper_process() function. After dequeue_completion() is called, it checks if zookeeper_close has been called. If it has been called, it returns ZINVALIDSTATE. --Michi c unit tests failing due to assertion cptr failed --- Key: ZOOKEEPER-804 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-804 Project: Zookeeper Issue Type: Bug Components: c client Affects Versions: 3.4.0 Environment: gcc 4.4.3, ubuntu lucid lynx, dual core laptop (intel) Reporter: Patrick Hunt Assignee: Michi Mutsuzaki Priority: Critical Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-804.patch I'm seeing this frequently: [exec] Zookeeper_simpleSystem::testPing : elapsed 18006 : OK [exec] Zookeeper_simpleSystem::testAcl : elapsed 1022 : OK [exec] Zookeeper_simpleSystem::testChroot : elapsed 3145 : OK [exec] Zookeeper_simpleSystem::testAuth ZooKeeper server started : elapsed 25687 : OK [exec] zktest-mt: /home/phunt/dev/workspace/gitzk/src/c/src/zookeeper.c:1952: zookeeper_process: Assertion `cptr' failed. [exec] make: *** [run-check] Aborted [exec] Zookeeper_simpleSystem::testHangingClient Mahadev can you take a look? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-804) c unit tests failing due to assertion cptr failed
[ https://issues.apache.org/jira/browse/ZOOKEEPER-804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michi Mutsuzaki updated ZOOKEEPER-804: -- Status: Patch Available (was: Open) c unit tests failing due to assertion cptr failed --- Key: ZOOKEEPER-804 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-804 Project: Zookeeper Issue Type: Bug Components: c client Affects Versions: 3.4.0 Environment: gcc 4.4.3, ubuntu lucid lynx, dual core laptop (intel) Reporter: Patrick Hunt Assignee: Michi Mutsuzaki Priority: Critical Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-804.patch I'm seeing this frequently: [exec] Zookeeper_simpleSystem::testPing : elapsed 18006 : OK [exec] Zookeeper_simpleSystem::testAcl : elapsed 1022 : OK [exec] Zookeeper_simpleSystem::testChroot : elapsed 3145 : OK [exec] Zookeeper_simpleSystem::testAuth ZooKeeper server started : elapsed 25687 : OK [exec] zktest-mt: /home/phunt/dev/workspace/gitzk/src/c/src/zookeeper.c:1952: zookeeper_process: Assertion `cptr' failed. [exec] make: *** [run-check] Aborted [exec] Zookeeper_simpleSystem::testHangingClient Mahadev can you take a look? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-864) Hedwig C++ client improvements
[ https://issues.apache.org/jira/browse/ZOOKEEPER-864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909002#action_12909002 ] Erwin Tam commented on ZOOKEEPER-864: - +1 Latest patch looks good to me. We can just approve it and push it in first. Thanks for the great work Ivan! Hedwig C++ client improvements -- Key: ZOOKEEPER-864 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-864 Project: Zookeeper Issue Type: Improvement Reporter: Ivan Kelly Assignee: Ivan Kelly Fix For: 3.4.0 Attachments: warnings.txt, ZOOKEEPER-864.diff, ZOOKEEPER-864.diff, ZOOKEEPER-864.diff I changed the socket code to use boost asio. Now the client only creates one thread, and all operations are non-blocking. Tests are now automated, just run make check. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-846) zookeeper client doesn't shut down cleanly on the close call
[ https://issues.apache.org/jira/browse/ZOOKEEPER-846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-846: --- Attachment: ZOOKEEPER-846.patch This patch fixes the problem by adding a check for close in queuepacket. Note that this also moves the close=true into queuepacket (otw the closeSession would never be queued during a close. I spend 1 day trying to craft a test for this, unfortunately due to the small size of the timing window I wasn't able to get a test that failed with this case. However by inspection it seems to address a clear bug. All tests are passing. zookeeper client doesn't shut down cleanly on the close call Key: ZOOKEEPER-846 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-846 Project: Zookeeper Issue Type: Bug Components: java client Affects Versions: 3.2.2 Reporter: Ted Yu Assignee: Patrick Hunt Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: rs-13.stack, ZOOKEEPER-846.patch Using HBase 0.20.6 (with HBASE-2473) we encountered a situation where Regionserver process was shutting down and seemed to hang. Here is the bottom of region server log: http://pastebin.com/YYawJ4jA zookeeper-3.2.2 is used. Here is relevant portion from jstack - I attempted to attach jstack twice in my email to d...@hbase.apache.org but failed: DestroyJavaVM prio=10 tid=0x2aabb849c800 nid=0x6c60 waiting on condition [0x] java.lang.Thread.State: RUNNABLE regionserver/10.32.42.245:60020 prio=10 tid=0x2aabb84ce000 nid=0x6c81 in Object.wait() [0x43755000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x2aaab76633c0 (a org.apache.zookeeper.ClientCnxn$Packet) at java.lang.Object.wait(Object.java:485) at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1099) - locked 0x2aaab76633c0 (a org.apache.zookeeper.ClientCnxn$Packet) at org.apache.zookeeper.ClientCnxn.close(ClientCnxn.java:1077) at org.apache.zookeeper.ZooKeeper.close(ZooKeeper.java:505) - locked 0x2aaabf5e0c30 (a org.apache.zookeeper.ZooKeeper) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.close(ZooKeeperWrapper.java:681) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:654) at java.lang.Thread.run(Thread.java:619) main-EventThread daemon prio=10 tid=0x43474000 nid=0x6c80 waiting on condition [0x413f3000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x2aaabf6e9150 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:414) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-846) zookeeper client doesn't shut down cleanly on the close call
[ https://issues.apache.org/jira/browse/ZOOKEEPER-846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-846: --- Status: Patch Available (was: Open) zookeeper client doesn't shut down cleanly on the close call Key: ZOOKEEPER-846 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-846 Project: Zookeeper Issue Type: Bug Components: java client Affects Versions: 3.2.2 Reporter: Ted Yu Assignee: Patrick Hunt Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: rs-13.stack, ZOOKEEPER-846.patch Using HBase 0.20.6 (with HBASE-2473) we encountered a situation where Regionserver process was shutting down and seemed to hang. Here is the bottom of region server log: http://pastebin.com/YYawJ4jA zookeeper-3.2.2 is used. Here is relevant portion from jstack - I attempted to attach jstack twice in my email to d...@hbase.apache.org but failed: DestroyJavaVM prio=10 tid=0x2aabb849c800 nid=0x6c60 waiting on condition [0x] java.lang.Thread.State: RUNNABLE regionserver/10.32.42.245:60020 prio=10 tid=0x2aabb84ce000 nid=0x6c81 in Object.wait() [0x43755000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x2aaab76633c0 (a org.apache.zookeeper.ClientCnxn$Packet) at java.lang.Object.wait(Object.java:485) at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1099) - locked 0x2aaab76633c0 (a org.apache.zookeeper.ClientCnxn$Packet) at org.apache.zookeeper.ClientCnxn.close(ClientCnxn.java:1077) at org.apache.zookeeper.ZooKeeper.close(ZooKeeper.java:505) - locked 0x2aaabf5e0c30 (a org.apache.zookeeper.ZooKeeper) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.close(ZooKeeperWrapper.java:681) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:654) at java.lang.Thread.run(Thread.java:619) main-EventThread daemon prio=10 tid=0x43474000 nid=0x6c80 waiting on condition [0x413f3000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x2aaabf6e9150 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:414) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model
[ https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909119#action_12909119 ] Flavio Junqueira commented on ZOOKEEPER-702: Hi Abmar, I believe my comments from Aug 18 have not been addressed. In particular, I still see the same javadoc issues and no test exercising the failure detectors with a running ensemble. I believe the default failure detector is naturally exercised in various tests, but it would be good to have tests that also exercise the other failure detectors in a running ensemble. Isn't it right? There are still several references in javadocs and in the documentation to heartbeat. Should we replace them with ping for consistency? I'm also seeing some tests failing, like ObserverTest and NioNettySuiteHammerTest, but I'm not sure if this is related to this patch. I will explore a little further. GSoC 2010: Failure Detector Model - Key: ZOOKEEPER-702 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702 Project: Zookeeper Issue Type: Wish Reporter: Henry Robinson Assignee: Abmar Barros Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, ZOOKEEPER-702-code.patch, ZOOKEEPER-702-doc.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch Failure Detector Module Possible Mentor Henry Robinson (henry at apache dot org) Requirements Java, some distributed systems knowledge, comfort implementing distributed systems protocols Description ZooKeeper servers detects the failure of other servers and clients by counting the number of 'ticks' for which it doesn't get a heartbeat from other machines. This is the 'timeout' method of failure detection and works very well; however it is possible that it is too aggressive and not easily tuned for some more unusual ZooKeeper installations (such as in a wide-area network, or even in a mobile ad-hoc network). This project would abstract the notion of failure detection to a dedicated Java module, and implement several failure detectors to compare and contrast their appropriateness for ZooKeeper. For example, Apache Cassandra uses a phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which is much more tunable and has some very interesting properties. This is a great project if you are interested in distributed algorithms, or want to help re-factor some of ZooKeeper's internal code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.