Re: zookeeper seems to hang

2010-08-12 Thread Patrick Hunt

Great bug report Ted, the stack trace in particular is very useful.

It looks like a timing bug where the client is not shutting down cleanly 
on the close call. I reviewed the code in question but nothing pops out 
at me. Also the logs just show us shutting down, nothing else from zk in 
there.


Create a jira and attach all the detail you have available.

Patrick

On 08/11/2010 03:21 PM, Ted Yu wrote:

Hi,
Using HBase 0.20.6 (with HBASE-2473) we encountered a situation where
Regionserver
process was shutting down and seemed to hang.

Here is the bottom of region server log:
http://pastebin.com/YYawJ4jA

zookeeper-3.2.2 is used.

Your comment is welcome.

Here is relevant portion from jstack - I attempted to attach jstack twice in
my email to d...@hbase.apache.org but failed:

DestroyJavaVM prio=10 tid=0x2aabb849c800 nid=0x6c60 waiting on
condition [0x]
java.lang.Thread.State: RUNNABLE

regionserver/10.32.42.245:60020 prio=10 tid=0x2aabb84ce000 nid=0x6c81
in Object.wait() [0x43755000]
java.lang.Thread.State: WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 - waiting on0x2aaab76633c0  (a
org.apache.zookeeper.ClientCnxn$Packet)
 at java.lang.Object.wait(Object.java:485)
 at
org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1099)
 - locked0x2aaab76633c0  (a
org.apache.zookeeper.ClientCnxn$Packet)
 at org.apache.zookeeper.ClientCnxn.close(ClientCnxn.java:1077)
 at org.apache.zookeeper.ZooKeeper.close(ZooKeeper.java:505)
 - locked0x2aaabf5e0c30  (a org.apache.zookeeper.ZooKeeper)
 at
org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.close(ZooKeeperWrapper.java:681)
 at
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:654)
 at java.lang.Thread.run(Thread.java:619)

main-EventThread daemon prio=10 tid=0x43474000 nid=0x6c80 waiting
on condition [0x413f3000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for0x2aaabf6e9150  (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
 at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
 at
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
 at
org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:414)

RMI TCP Accept-0 daemon prio=10 tid=0x2aabb822c800 nid=0x6c7d runnable
[0x40752000]
java.lang.Thread.State: RUNNABLE
 at java.net.PlainSocketImpl.socketAccept(Native Method)
 at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:390)
 - locked0x2aaabf585578  (a java.net.SocksSocketImpl)
 at java.net.ServerSocket.implAccept(ServerSocket.java:453)
 at java.net.ServerSocket.accept(ServerSocket.java:421)
 at
sun.management.jmxremote.LocalRMIServerSocketFactory$1.accept(LocalRMIServerSocketFactory.java:34)
 at
sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:369)
 at
sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:341)
 at java.lang.Thread.run(Thread.java:619)



Re: zookeeper seems to hang

2010-08-12 Thread Ted Yu
Please see:
https://issues.apache.org/jira/browse/ZOOKEEPER-846

On Thu, Aug 12, 2010 at 10:00 AM, Patrick Hunt ph...@apache.org wrote:

 Great bug report Ted, the stack trace in particular is very useful.

 It looks like a timing bug where the client is not shutting down cleanly on
 the close call. I reviewed the code in question but nothing pops out at me.
 Also the logs just show us shutting down, nothing else from zk in there.

 Create a jira and attach all the detail you have available.

 Patrick


 On 08/11/2010 03:21 PM, Ted Yu wrote:

 Hi,
 Using HBase 0.20.6 (with HBASE-2473) we encountered a situation where
 Regionserver
 process was shutting down and seemed to hang.

 Here is the bottom of region server log:
 http://pastebin.com/YYawJ4jA

 zookeeper-3.2.2 is used.

 Your comment is welcome.

 Here is relevant portion from jstack - I attempted to attach jstack twice
 in
 my email to d...@hbase.apache.org but failed:

 DestroyJavaVM prio=10 tid=0x2aabb849c800 nid=0x6c60 waiting on
 condition [0x]
java.lang.Thread.State: RUNNABLE

 regionserver/10.32.42.245:60020 prio=10 tid=0x2aabb84ce000
 nid=0x6c81
 in Object.wait() [0x43755000]
java.lang.Thread.State: WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 - waiting on0x2aaab76633c0  (a
 org.apache.zookeeper.ClientCnxn$Packet)
 at java.lang.Object.wait(Object.java:485)
 at
 org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1099)
 - locked0x2aaab76633c0  (a
 org.apache.zookeeper.ClientCnxn$Packet)
 at org.apache.zookeeper.ClientCnxn.close(ClientCnxn.java:1077)
 at org.apache.zookeeper.ZooKeeper.close(ZooKeeper.java:505)
 - locked0x2aaabf5e0c30  (a org.apache.zookeeper.ZooKeeper)
 at

 org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.close(ZooKeeperWrapper.java:681)
 at

 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:654)
 at java.lang.Thread.run(Thread.java:619)

 main-EventThread daemon prio=10 tid=0x43474000 nid=0x6c80
 waiting
 on condition [0x413f3000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for0x2aaabf6e9150  (a
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at
 java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
 at

 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
 at

 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
 at
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:414)

 RMI TCP Accept-0 daemon prio=10 tid=0x2aabb822c800 nid=0x6c7d
 runnable
 [0x40752000]
java.lang.Thread.State: RUNNABLE
 at java.net.PlainSocketImpl.socketAccept(Native Method)
 at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:390)
 - locked0x2aaabf585578  (a java.net.SocksSocketImpl)
 at java.net.ServerSocket.implAccept(ServerSocket.java:453)
 at java.net.ServerSocket.accept(ServerSocket.java:421)
 at

 sun.management.jmxremote.LocalRMIServerSocketFactory$1.accept(LocalRMIServerSocketFactory.java:34)
 at

 sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:369)
 at
 sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:341)
 at java.lang.Thread.run(Thread.java:619)