[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907523#action_12907523
 ] 

Patrick Hunt commented on ZOOKEEPER-846:
----------------------------------------

I believe I found the issue. Both this specific jira and HBASE-2966.

During session close we set the "closing" flag, then later set the ZK state to 
CLOSED. zk client operations track outstanding
requests to the server on the "outstandingqueue". There's a timing window where 
we can clear the outstanding
queue (caused by the "closing" flag being set AND an error occurs on the 
socket) and exit the sendthread.  Subsequent 
to this if an operation runs before the ZK state is set to CLOSED it will queue 
additional packets to the outstandingqueue. 
These packets will never be processed, and as a result the original operation 
will never complete.

in queuepacket we need to check if closing is set before queueing a packet (we 
currently check the state only)

I'm working on a test/fix.

> zookeeper client doesn't shut down cleanly on the close call
> ------------------------------------------------------------
>
>                 Key: ZOOKEEPER-846
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-846
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: java client
>    Affects Versions: 3.2.2
>            Reporter: Ted Yu
>            Priority: Blocker
>             Fix For: 3.3.2, 3.4.0
>
>         Attachments: rs-13.stack
>
>
> Using HBase 0.20.6 (with HBASE-2473) we encountered a situation where 
> Regionserver
> process was shutting down and seemed to hang.
> Here is the bottom of region server log:
> http://pastebin.com/YYawJ4jA
> zookeeper-3.2.2 is used.
> Here is relevant portion from jstack - I attempted to attach jstack twice in 
> my email to d...@hbase.apache.org but failed:
> "DestroyJavaVM" prio=10 tid=0x00002aabb849c800 nid=0x6c60 waiting on 
> condition [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
> "regionserver/10.32.42.245:60020" prio=10 tid=0x00002aabb84ce000 nid=0x6c81 
> in Object.wait() [0x0000000043755000]
>    java.lang.Thread.State: WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         - waiting on <0x00002aaab76633c0> (a 
> org.apache.zookeeper.ClientCnxn$Packet)
>         at java.lang.Object.wait(Object.java:485)
>         at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1099)
>         - locked <0x00002aaab76633c0> (a 
> org.apache.zookeeper.ClientCnxn$Packet)
>         at org.apache.zookeeper.ClientCnxn.close(ClientCnxn.java:1077)
>         at org.apache.zookeeper.ZooKeeper.close(ZooKeeper.java:505)
>         - locked <0x00002aaabf5e0c30> (a org.apache.zookeeper.ZooKeeper)
>         at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.close(ZooKeeperWrapper.java:681)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:654)
>         at java.lang.Thread.run(Thread.java:619)
> "main-EventThread" daemon prio=10 tid=0x0000000043474000 nid=0x6c80 waiting 
> on condition [0x00000000413f3000]
>    java.lang.Thread.State: WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00002aaabf6e9150> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
>         at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
>         at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:414)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to