[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-846:
-----------------------------------

    Attachment: ZOOKEEPER-846.patch

This patch fixes the problem by adding a check for close in queuepacket.

Note that this also moves the close=true into queuepacket (otw the closeSession 
would never be queued during a close.

I spend > 1 day trying to craft a test for this, unfortunately due to the small 
size of the timing window I wasn't able to get a test that failed with this 
case. However by inspection it seems to address a clear bug. All tests are 
passing.

> zookeeper client doesn't shut down cleanly on the close call
> ------------------------------------------------------------
>
>                 Key: ZOOKEEPER-846
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-846
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: java client
>    Affects Versions: 3.2.2
>            Reporter: Ted Yu
>            Assignee: Patrick Hunt
>            Priority: Blocker
>             Fix For: 3.3.2, 3.4.0
>
>         Attachments: rs-13.stack, ZOOKEEPER-846.patch
>
>
> Using HBase 0.20.6 (with HBASE-2473) we encountered a situation where 
> Regionserver
> process was shutting down and seemed to hang.
> Here is the bottom of region server log:
> http://pastebin.com/YYawJ4jA
> zookeeper-3.2.2 is used.
> Here is relevant portion from jstack - I attempted to attach jstack twice in 
> my email to d...@hbase.apache.org but failed:
> "DestroyJavaVM" prio=10 tid=0x00002aabb849c800 nid=0x6c60 waiting on 
> condition [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
> "regionserver/10.32.42.245:60020" prio=10 tid=0x00002aabb84ce000 nid=0x6c81 
> in Object.wait() [0x0000000043755000]
>    java.lang.Thread.State: WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         - waiting on <0x00002aaab76633c0> (a 
> org.apache.zookeeper.ClientCnxn$Packet)
>         at java.lang.Object.wait(Object.java:485)
>         at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1099)
>         - locked <0x00002aaab76633c0> (a 
> org.apache.zookeeper.ClientCnxn$Packet)
>         at org.apache.zookeeper.ClientCnxn.close(ClientCnxn.java:1077)
>         at org.apache.zookeeper.ZooKeeper.close(ZooKeeper.java:505)
>         - locked <0x00002aaabf5e0c30> (a org.apache.zookeeper.ZooKeeper)
>         at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.close(ZooKeeperWrapper.java:681)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:654)
>         at java.lang.Thread.run(Thread.java:619)
> "main-EventThread" daemon prio=10 tid=0x0000000043474000 nid=0x6c80 waiting 
> on condition [0x00000000413f3000]
>    java.lang.Thread.State: WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00002aaabf6e9150> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
>         at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
>         at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:414)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to