Ideas? Weird WOWorkerThread hang on Solaris

Chuck Hill Thu, 05 May 2011 15:12:53 -0700

Hi,

The environment are multiple SunFire boxes running Solaris 10 something with a 
WO 5.2.4 (nope, not a typo) app using JDK 1.4.2.  This is with the newest 
Wonder mod_webobjects re-compiled for Apache 2.2 on these machines.  wotaskd 
and JavaMonitor are the latest from Wonder, but the stock 5.2.4 versions of all 
three exhibited the same problem.  There is a web balancer in front, with 
Apache running on three machines, and instances running on 5 machines.


Most of the time the instances are very responsive with dispatchRequest 
processing times averaging 41ms.  Then  under increasing load (users per 
instance, we can reproduce this with one instance and 20 people), responses 
start slowing down, quickly spreading to all users.

The WOAdaptorInfo page show multiple active requests in each instance while 
dispatchRequest shows an idle instance.  If we get a thread dump from the 
instance, we see that all of the WOWorkerThread's are blocked in socketRead 
(the pool quickly grows from 16 to the max configured):

"WorkerThread60" prio=5 tid=0x00100cb0 nid=0x47 runnable 
[0xe08ff000..0xe08ffc30]
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.read(Unknown Source)
        at 
com.webobjects.appserver._private.WOHttpIO.refillInputBuffer(WOHttpIO.java:131)
        at 
com.webobjects.appserver._private.WOHttpIO.readLine(WOHttpIO.java:187)
        at 
com.webobjects.appserver._private.WOHttpIO.readRequestFromSocket(WOHttpIO.java:279)
        at 
com.webobjects.appserver._private.WOWorkerThread.runOnce(WOWorkerThread.java:79)
        at 
com.webobjects.appserver._private.WOWorkerThread.run(WOWorkerThread.java:254)
        at java.lang.Thread.run(Unknown Source)

Then after several seconds to nearly a minute, they all unblock and complete 
normally.  Sometimes it takes longer than a minute (Receive Timeout is set to a 
high 60 seconds) and the users get bounced to a new instance where they get an 
"unable to restore session" message.  Oddly, all the servers seems to get 
blocked and unblocked at pretty much the same time.

I think the problem exists in the other direction as well, as we have seen 
cases where the page partly loaded and finished after two or more minutes.

There is little CPU, memory, or I/O load on the machines.  The network tests 
out OK.  Apache vends static resources quickly when the apps are stuck.

Any ideas?  It looks to be a problem in the TCP communications between the app 
servers and the instance servers.  But what?  Anyone want to play?


Chuck

-- 
Chuck Hill             Senior Consultant / VP Development

Come to WOWODC this July for unparalleled WO learning opportunities and real 
peer to peer problem solving!  Network, socialize, and enjoy a great 
cosmopolitan city.  See you there!  http://www.wocommunity.org/wowodc11/

smime.p7s
Description: S/MIME cryptographic signature

 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list      ([email protected])
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com

This email sent to [email protected]

Ideas? Weird WOWorkerThread hang on Solaris

Reply via email to