[ https://issues.apache.org/jira/browse/ZOOKEEPER-846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907523#action_12907523 ]
Patrick Hunt commented on ZOOKEEPER-846: ---------------------------------------- I believe I found the issue. Both this specific jira and HBASE-2966. During session close we set the "closing" flag, then later set the ZK state to CLOSED. zk client operations track outstanding requests to the server on the "outstandingqueue". There's a timing window where we can clear the outstanding queue (caused by the "closing" flag being set AND an error occurs on the socket) and exit the sendthread. Subsequent to this if an operation runs before the ZK state is set to CLOSED it will queue additional packets to the outstandingqueue. These packets will never be processed, and as a result the original operation will never complete. in queuepacket we need to check if closing is set before queueing a packet (we currently check the state only) I'm working on a test/fix. > zookeeper client doesn't shut down cleanly on the close call > ------------------------------------------------------------ > > Key: ZOOKEEPER-846 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-846 > Project: Zookeeper > Issue Type: Bug > Components: java client > Affects Versions: 3.2.2 > Reporter: Ted Yu > Priority: Blocker > Fix For: 3.3.2, 3.4.0 > > Attachments: rs-13.stack > > > Using HBase 0.20.6 (with HBASE-2473) we encountered a situation where > Regionserver > process was shutting down and seemed to hang. > Here is the bottom of region server log: > http://pastebin.com/YYawJ4jA > zookeeper-3.2.2 is used. > Here is relevant portion from jstack - I attempted to attach jstack twice in > my email to d...@hbase.apache.org but failed: > "DestroyJavaVM" prio=10 tid=0x00002aabb849c800 nid=0x6c60 waiting on > condition [0x0000000000000000] > java.lang.Thread.State: RUNNABLE > "regionserver/10.32.42.245:60020" prio=10 tid=0x00002aabb84ce000 nid=0x6c81 > in Object.wait() [0x0000000043755000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <0x00002aaab76633c0> (a > org.apache.zookeeper.ClientCnxn$Packet) > at java.lang.Object.wait(Object.java:485) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1099) > - locked <0x00002aaab76633c0> (a > org.apache.zookeeper.ClientCnxn$Packet) > at org.apache.zookeeper.ClientCnxn.close(ClientCnxn.java:1077) > at org.apache.zookeeper.ZooKeeper.close(ZooKeeper.java:505) > - locked <0x00002aaabf5e0c30> (a org.apache.zookeeper.ZooKeeper) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.close(ZooKeeperWrapper.java:681) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:654) > at java.lang.Thread.run(Thread.java:619) > "main-EventThread" daemon prio=10 tid=0x0000000043474000 nid=0x6c80 waiting > on condition [0x00000000413f3000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00002aaabf6e9150> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:414) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.