Runaway thread
--------------

                 Key: ZOOKEEPER-865
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-865
             Project: Zookeeper
          Issue Type: Bug
    Affects Versions: 3.3.1, 3.3.0
         Environment: Linux; Java 1.6; x86;
            Reporter: Stephen McCants
            Priority: Critical


I'm starting a standalone Zookeeper server (v3.3.1).  That starts normally and 
does not have a runaway thread.

Next, I start an based Eclipse application that is using ZK 3.3.0 to register 
itself with the ZooKeeper server (3.3.1).  The Eclipse application using the 
following arguments to Eclipse:

-Dzoodiscovery.autoStart=true
-Dzoodiscovery.flavor=zoodiscovery.flavor.centralized=smccants.austin.ibm.com

When the Eclipse application starts, the ZK server prints out:

2010-09-03 09:59:46,006 - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:nioservercnxn$fact...@250] - 
Accepted socket connection from /9.53.189.11:42271
2010-09-03 09:59:46,039 - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:nioserverc...@776] - Client 
attempting to establish new session at /9.53.189.11:42271
2010-09-03 09:59:46,045 - INFO  [SyncThread:0:nioserverc...@1579] - Established 
session 0x12ad81b90000002 with negotiated timeout 4000 for client 
/9.53.189.11:42271
2010-09-03 09:59:46,046 - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:nioservercnxn$fact...@250] - 
Accepted socket connection from /9.53.189.11:42272
2010-09-03 09:59:46,078 - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:nioserverc...@776] - Client 
attempting to establish new session at /9.53.189.11:42272
2010-09-03 09:59:46,080 - INFO  [SyncThread:0:nioserverc...@1579] - Established 
session 0x12ad81b90000003 with negotiated timeout 4000 for client 
/9.53.189.11:42272

Then both the Eclipse application and the ZK server go into runaway states and 
consume 100% of the CPU.

Here is a view from top:

  PID USER        PR  NI  VIRT    RES  SHR S %CPU %MEM    TIME+  COMMAND
4949 smccants  15   0  597m  78m 5964 S    66.2      1.0      1:03.14 
autosubmitter
4876 smccants  17   0  554m  27m 6688 S    30.9       0.3     0:34.74 java

PID 4949 (autosubmitter) is the Eclipse application and is using more than 
twice the CPU of PID 4876 (java) which is the ZK server.  They will continue in 
this state indefinitely.

I can attach a debugger to the Eclipse application and if I stop the thread 
named "pool-1-thread-2-SendThread(smccants.austin.ibm.com:2181)" and the 
runaway condition stops on both the application and ZK server.  However the ZK 
server reports:

2010-09-03 10:03:38,001 - INFO  [SessionTracker:zookeeperser...@315] - Expiring 
session 0x12ad81b90000003, timeout of 4000ms exceeded
2010-09-03 10:03:38,002 - INFO  [ProcessThread:-1:preprequestproces...@208] - 
Processed session termination for sessionid: 0x12ad81b90000003
2010-09-03 10:03:38,005 - INFO  [SyncThread:0:nioserverc...@1434] - Closed 
socket connection for client /9.53.189.11:42272 which had sessionid 
0x12ad81b90000003

Here is the stack trace from the suspended thread:

EPollArrayWrapper.epollWait(long, int, long, int) line: not available [native 
method]   
EPollArrayWrapper.poll(long) line: 215  
EPollSelectorImpl.doSelect(long) line: 77       
EPollSelectorImpl(SelectorImpl).lockAndDoSelect(long) line: 69  
EPollSelectorImpl(SelectorImpl).select(long) line: 80   
ClientCnxn$SendThread.run() line: 1066  

Any ideas what might be going wrong?

Thanks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to