Re: Ideas? Weird WOWorkerThread hang on Solaris

Chuck Hill Sun, 08 May 2011 20:11:46 -0700

Hi Klaus,

Thanks for the reply.  Late Friday, we tracked it down (we think!) to a bad 
build of Apache 2.0 or (more likely) of mod_webobjects.    We rolled back to 
Apache 1.3 and the stock Solaris adaptor from Apple (uh, yeah, WO 5.2).  The 
problem noted below went away.


I discovered that they had use the SolarisAdaptor.zip referenced in 
http://wiki.objectstyle.org/confluence/display/WO/Deploying+on+Solaris+%28WO+5.3.3%29
My working theory at this point is that this branch of the adaptor source is 
defective in some way under load.  The plan is to build either the Apache 2.0 
or 2.2 (the 2.2 version will require a full build of Apace as well) from  
Wonder and give that a trial.  If that resolves the defect, I will update the 
Wiki page and give mDimension a copy for their mod_webobjects downloads (kind 
thanks to Bill Chin & co for hosting this).

Which version of mod_webobjects are you using with Solaris?

Chuck

P.S.  A max of 4 sessions per instance is NOT normal for WO app.  I would go 
with closer to 400 :-)  The more useful statistic is requests per minute per 
instance and that varies a lot on what you app is doing.


Chuck


On May 7, 2011, at 12:48 PM, Klaus Berkling wrote:

> 
> Hi Chuck.
> 
> On May 5, 2011, at 3:12 PM, Chuck Hill wrote:
> 
>> The environment are multiple SunFire boxes running Solaris 10 something with 
>> a WO 5.2.4 (nope, not a typo) app using JDK 1.4.2.  This is with the newest 
>> Wonder mod_webobjects re-compiled for Apache 2.2 on these machines.  wotaskd 
>> and JavaMonitor are the latest from Wonder, but the stock 5.2.4 versions of 
>> all three exhibited the same problem.  There is a web balancer in front, 
>> with Apache running on three machines, and instances running on 5 machines.
>> 
>> Most of the time the instances are very responsive with dispatchRequest 
>> processing times averaging 41ms.  Then  under increasing load (users per 
>> instance, we can reproduce this with one instance and 20 people), responses 
>> start slowing down, quickly spreading to all users.
>> 
>> The WOAdaptorInfo page show multiple active requests in each instance while 
>> dispatchRequest shows an idle instance.  If we get a thread dump from the 
>> instance, we see that all of the WOWorkerThread's are blocked in socketRead 
>> (the pool quickly grows from 16 to the max configured):
>> 
>> "WorkerThread60" prio=5 tid=0x00100cb0 nid=0x47 runnable 
>> [0xe08ff000..0xe08ffc30]
>>      at java.net.SocketInputStream.socketRead0(Native Method)
>>      at java.net.SocketInputStream.read(Unknown Source)
>>      at 
>> com.webobjects.appserver._private.WOHttpIO.refillInputBuffer(WOHttpIO.java:131)
>>      at 
>> com.webobjects.appserver._private.WOHttpIO.readLine(WOHttpIO.java:187)
>>      at 
>> com.webobjects.appserver._private.WOHttpIO.readRequestFromSocket(WOHttpIO.java:279)
>>      at 
>> com.webobjects.appserver._private.WOWorkerThread.runOnce(WOWorkerThread.java:79)
>>      at 
>> com.webobjects.appserver._private.WOWorkerThread.run(WOWorkerThread.java:254)
>>      at java.lang.Thread.run(Unknown Source)
>> 
>> Then after several seconds to nearly a minute, they all unblock and complete 
>> normally.  Sometimes it takes longer than a minute (Receive Timeout is set 
>> to a high 60 seconds) and the users get bounced to a new instance where they 
>> get an "unable to restore session" message.  Oddly, all the servers seems to 
>> get blocked and unblocked at pretty much the same time.
>> 
>> I think the problem exists in the other direction as well, as we have seen 
>> cases where the page partly loaded and finished after two or more minutes.
>> 
>> There is little CPU, memory, or I/O load on the machines.  The network tests 
>> out OK.  Apache vends static resources quickly when the apps are stuck.
>> 
>> Any ideas?  It looks to be a problem in the TCP communications between the 
>> app servers and the instance servers.  But what?  Anyone want to play?
> 
> 
> Not sure if I can point you to things you haven't already looked at.
> 
> Our WO app is different from most others, keeping that in mind here are some 
> thoughts:
> - Check the sysctl values (maxfiles, maxproc, etc).
> - 20 users (sessions?) on one instances seems a bit much, our instances 
> become unhappy when they reach around 4 sessions. Our app is database 
> intensive so it makes a difference for us. What happens if you add 3 more 
> instances each existing instance?
> Less relevant:
> - Keep the number of running httpd processes low. Never liked keeping a high 
> number of idle httpd servers running.
> - Any slow database queries?
> 
> Hope this helps.
> 
> 
> 
> kib
> 
> "We keep moving forward, opening new doors, and doing new things, because 
> we're curious and curiosity keeps leading us down new paths."
> Walt Disney
> 
> Klaus Berkling
> Web Application Dev. & Systems Administrator
> DynEd International, Inc.
> www.dyned.com | www.eskimo.com/~kiberkli
> 
> 
> 
> 

-- 
Chuck Hill             Senior Consultant / VP Development

Come to WOWODC this July for unparalleled WO learning opportunities and real 
peer to peer problem solving!  Network, socialize, and enjoy a great 
cosmopolitan city.  See you there!  http://www.wocommunity.org/wowodc11/

smime.p7s
Description: S/MIME cryptographic signature

 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list      ([email protected])
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com

This email sent to [email protected]

Re: Ideas? Weird WOWorkerThread hang on Solaris

Reply via email to