[jira] Commented: (ZOOKEEPER-846) zookeeper client doesn't shut down cleanly on the close call
[ https://issues.apache.org/jira/browse/ZOOKEEPER-846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913422#action_12913422 ] Patrick Hunt commented on ZOOKEEPER-846: committed to trunk and branch33 zookeeper client doesn't shut down cleanly on the close call Key: ZOOKEEPER-846 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-846 Project: Zookeeper Issue Type: Bug Components: java client Affects Versions: 3.2.2 Reporter: Ted Yu Assignee: Patrick Hunt Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: rs-13.stack, ZOOKEEPER-846.patch Using HBase 0.20.6 (with HBASE-2473) we encountered a situation where Regionserver process was shutting down and seemed to hang. Here is the bottom of region server log: http://pastebin.com/YYawJ4jA zookeeper-3.2.2 is used. Here is relevant portion from jstack - I attempted to attach jstack twice in my email to d...@hbase.apache.org but failed: DestroyJavaVM prio=10 tid=0x2aabb849c800 nid=0x6c60 waiting on condition [0x] java.lang.Thread.State: RUNNABLE regionserver/10.32.42.245:60020 prio=10 tid=0x2aabb84ce000 nid=0x6c81 in Object.wait() [0x43755000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x2aaab76633c0 (a org.apache.zookeeper.ClientCnxn$Packet) at java.lang.Object.wait(Object.java:485) at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1099) - locked 0x2aaab76633c0 (a org.apache.zookeeper.ClientCnxn$Packet) at org.apache.zookeeper.ClientCnxn.close(ClientCnxn.java:1077) at org.apache.zookeeper.ZooKeeper.close(ZooKeeper.java:505) - locked 0x2aaabf5e0c30 (a org.apache.zookeeper.ZooKeeper) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.close(ZooKeeperWrapper.java:681) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:654) at java.lang.Thread.run(Thread.java:619) main-EventThread daemon prio=10 tid=0x43474000 nid=0x6c80 waiting on condition [0x413f3000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x2aaabf6e9150 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:414) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-846) zookeeper client doesn't shut down cleanly on the close call
[ https://issues.apache.org/jira/browse/ZOOKEEPER-846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913497#action_12913497 ] Hudson commented on ZOOKEEPER-846: -- Integrated in ZooKeeper-trunk #944 (See [https://hudson.apache.org/hudson/job/ZooKeeper-trunk/944/]) ZOOKEEPER-846. zookeeper client doesn't shut down cleanly on the close call zookeeper client doesn't shut down cleanly on the close call Key: ZOOKEEPER-846 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-846 Project: Zookeeper Issue Type: Bug Components: java client Affects Versions: 3.2.2 Reporter: Ted Yu Assignee: Patrick Hunt Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: rs-13.stack, ZOOKEEPER-846.patch Using HBase 0.20.6 (with HBASE-2473) we encountered a situation where Regionserver process was shutting down and seemed to hang. Here is the bottom of region server log: http://pastebin.com/YYawJ4jA zookeeper-3.2.2 is used. Here is relevant portion from jstack - I attempted to attach jstack twice in my email to d...@hbase.apache.org but failed: DestroyJavaVM prio=10 tid=0x2aabb849c800 nid=0x6c60 waiting on condition [0x] java.lang.Thread.State: RUNNABLE regionserver/10.32.42.245:60020 prio=10 tid=0x2aabb84ce000 nid=0x6c81 in Object.wait() [0x43755000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x2aaab76633c0 (a org.apache.zookeeper.ClientCnxn$Packet) at java.lang.Object.wait(Object.java:485) at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1099) - locked 0x2aaab76633c0 (a org.apache.zookeeper.ClientCnxn$Packet) at org.apache.zookeeper.ClientCnxn.close(ClientCnxn.java:1077) at org.apache.zookeeper.ZooKeeper.close(ZooKeeper.java:505) - locked 0x2aaabf5e0c30 (a org.apache.zookeeper.ZooKeeper) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.close(ZooKeeperWrapper.java:681) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:654) at java.lang.Thread.run(Thread.java:619) main-EventThread daemon prio=10 tid=0x43474000 nid=0x6c80 waiting on condition [0x413f3000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x2aaabf6e9150 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:414) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-846) zookeeper client doesn't shut down cleanly on the close call
[ https://issues.apache.org/jira/browse/ZOOKEEPER-846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909634#action_12909634 ] Hadoop QA commented on ZOOKEEPER-846: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12454491/ZOOKEEPER-846.patch against trunk revision 997192. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/111/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/111/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/111/console This message is automatically generated. zookeeper client doesn't shut down cleanly on the close call Key: ZOOKEEPER-846 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-846 Project: Zookeeper Issue Type: Bug Components: java client Affects Versions: 3.2.2 Reporter: Ted Yu Assignee: Patrick Hunt Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: rs-13.stack, ZOOKEEPER-846.patch Using HBase 0.20.6 (with HBASE-2473) we encountered a situation where Regionserver process was shutting down and seemed to hang. Here is the bottom of region server log: http://pastebin.com/YYawJ4jA zookeeper-3.2.2 is used. Here is relevant portion from jstack - I attempted to attach jstack twice in my email to d...@hbase.apache.org but failed: DestroyJavaVM prio=10 tid=0x2aabb849c800 nid=0x6c60 waiting on condition [0x] java.lang.Thread.State: RUNNABLE regionserver/10.32.42.245:60020 prio=10 tid=0x2aabb84ce000 nid=0x6c81 in Object.wait() [0x43755000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x2aaab76633c0 (a org.apache.zookeeper.ClientCnxn$Packet) at java.lang.Object.wait(Object.java:485) at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1099) - locked 0x2aaab76633c0 (a org.apache.zookeeper.ClientCnxn$Packet) at org.apache.zookeeper.ClientCnxn.close(ClientCnxn.java:1077) at org.apache.zookeeper.ZooKeeper.close(ZooKeeper.java:505) - locked 0x2aaabf5e0c30 (a org.apache.zookeeper.ZooKeeper) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.close(ZooKeeperWrapper.java:681) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:654) at java.lang.Thread.run(Thread.java:619) main-EventThread daemon prio=10 tid=0x43474000 nid=0x6c80 waiting on condition [0x413f3000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x2aaabf6e9150 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:414) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-846) zookeeper client doesn't shut down cleanly on the close call
[ https://issues.apache.org/jira/browse/ZOOKEEPER-846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12907523#action_12907523 ] Patrick Hunt commented on ZOOKEEPER-846: I believe I found the issue. Both this specific jira and HBASE-2966. During session close we set the closing flag, then later set the ZK state to CLOSED. zk client operations track outstanding requests to the server on the outstandingqueue. There's a timing window where we can clear the outstanding queue (caused by the closing flag being set AND an error occurs on the socket) and exit the sendthread. Subsequent to this if an operation runs before the ZK state is set to CLOSED it will queue additional packets to the outstandingqueue. These packets will never be processed, and as a result the original operation will never complete. in queuepacket we need to check if closing is set before queueing a packet (we currently check the state only) I'm working on a test/fix. zookeeper client doesn't shut down cleanly on the close call Key: ZOOKEEPER-846 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-846 Project: Zookeeper Issue Type: Bug Components: java client Affects Versions: 3.2.2 Reporter: Ted Yu Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: rs-13.stack Using HBase 0.20.6 (with HBASE-2473) we encountered a situation where Regionserver process was shutting down and seemed to hang. Here is the bottom of region server log: http://pastebin.com/YYawJ4jA zookeeper-3.2.2 is used. Here is relevant portion from jstack - I attempted to attach jstack twice in my email to d...@hbase.apache.org but failed: DestroyJavaVM prio=10 tid=0x2aabb849c800 nid=0x6c60 waiting on condition [0x] java.lang.Thread.State: RUNNABLE regionserver/10.32.42.245:60020 prio=10 tid=0x2aabb84ce000 nid=0x6c81 in Object.wait() [0x43755000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x2aaab76633c0 (a org.apache.zookeeper.ClientCnxn$Packet) at java.lang.Object.wait(Object.java:485) at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1099) - locked 0x2aaab76633c0 (a org.apache.zookeeper.ClientCnxn$Packet) at org.apache.zookeeper.ClientCnxn.close(ClientCnxn.java:1077) at org.apache.zookeeper.ZooKeeper.close(ZooKeeper.java:505) - locked 0x2aaabf5e0c30 (a org.apache.zookeeper.ZooKeeper) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.close(ZooKeeperWrapper.java:681) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:654) at java.lang.Thread.run(Thread.java:619) main-EventThread daemon prio=10 tid=0x43474000 nid=0x6c80 waiting on condition [0x413f3000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x2aaabf6e9150 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:414) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.