-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Mike,

On 1/2/12 9:55 PM, Mike Wertheim wrote:
> On Mon, Jan 2, 2012 at 3:09 PM, Christopher Schultz
>> What were those 4 RUNNABLE threads doing?
> 
> They were processing POST requests from the client with a somewhat 
> large data payload.
> 
> Here's the stack from one of them: "http-apr-8080-exec-169" daemon
> prio=10 tid=0x00002aaabc893000 nid=0x5253 runnable
> [0x00000000451a9000] java.lang.Thread.State: RUNNABLE at
> org.apache.tomcat.jni.Socket.recvbb(Native Method) at
> org.apache.coyote.http11.InternalAprInputBuffer.fill(InternalAprInputBuffer.java:614)
>
> 
at
org.apache.coyote.http11.InternalAprInputBuffer$SocketInputBuffer.doRead(InternalAprInputBuffer.java:656)
> at
> org.apache.coyote.http11.filters.IdentityInputFilter.doRead(IdentityInputFilter.java:118)
>
> 
at
org.apache.coyote.http11.InternalAprInputBuffer.doRead(InternalAprInputBuffer.java:548)
> at org.apache.coyote.Request.doRead(Request.java:422) at
> org.apache.catalina.connector.InputBuffer.realReadBytes(InputBuffer.java:290)
>
> 
at org.apache.tomcat.util.buf.ByteChunk.substract(ByteChunk.java:429)
> at
> org.apache.catalina.connector.InputBuffer.read(InputBuffer.java:315)
>
> 
at
org.apache.catalina.connector.CoyoteInputStream.read(CoyoteInputStream.java:200)
> at
> org.apache.catalina.connector.Request.readPostBody(Request.java:3096)
>
> 
at org.apache.catalina.connector.Request.parseParameters(Request.java:3049)
> at
> org.apache.catalina.connector.Request.getParameter(Request.java:1139)
>
> 
at
org.apache.catalina.connector.RequestFacade.getParameter(RequestFacade.java:382)
> at com.mycompany.UploadServlet.service(UploadServlet.java:29)

Okay, that looks like a fairly-standard reading-the-POST-body
operation. I don't think there's anything concerning in there.

>>> For the port 8443 connector, there are 81 threads in "waiting
>>> on condition" and 119 threads in "runnable".
>> 
>> Well, that looks like it might be a problem. What about the
>> stack traces for THOSE threads?
>> 
> They mostly look like this:
> 
> "http-apr-8443-exec-200" daemon prio=10 tid=0x00002aaac8132800 
> nid=0x520f runnable [0x0000000044fba000] java.lang.Thread.State:
> RUNNABLE at org.apache.tomcat.jni.SSLSocket.handshake(Native
> Method) at
> org.apache.tomcat.util.net.AprEndpoint.setSocketOptions(AprEndpoint.java:751)
>
> 
at
org.apache.tomcat.util.net.AprEndpoint$SocketWithOptionsProcessor.run(AprEndpoint.java:1766)
> - locked <0x000000067f3101e0> (a 
> org.apache.tomcat.util.net.SocketWrapper) at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>
> 
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:722)

Uh, oh. If you have 81 threads stuck in that state, then something is
probably broken with the SSL handshake. Do you have any suspicion of
some kind of DOS attack or something? I have to imagine that the APR
SSL code is fairly robust.

I'd be interested to see just how "runnable" that thread is, given
that Chuck points out that a "Java RUNNABLE" thread might mean it's
actually blocked in native code somewhere. At least blocked threads
don't burn CPU time, even though they might not be accomplishing
anything useful.

>> When you say "slowly grinds to a halt" do you mean that there's
>> point where you can no longer accept HTTP requests? How about
>> HTTPS? When you get stuck, what does 'top' show? Do the threads
>> look idle or are you burning up your CPU?
> 
> It definitely stops accepting some (but not all) http requests.
> To answer the other questions, I'd have to run the test again (and
> it takes a day to fail).

That's odd that the HTTP responses stall when it looks like the HTTPS
connector is the one suffering. That might point to thread starvation
if you are experiencing spinning threads instead of blocked ones. But
in your followup you said CPU was idle. Hmm.

Can you reproduce this in a test environment? If so, lower the number
of threads in each connector in order to trigger the problem sooner.
So, instead of 200 threads per connector, use 20 and see if you can
lock it up in less time. I suspect you can.

If this is in production, I highly recommend that you find yourself a
test box that you can abuse while investigating this issue.

Any errors shown in any logs? Look at logs/*, not just catalina.out.
Also, check syslog just in case there's something in there for
whatever reason.

Also, please follow the "7.0.22 + fd leak with APR/native" thread to
see if any of that matches your observations.

- -chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk8E3RsACgkQ9CaO5/Lv0PCT6wCgncxtU2TDkvynbxm2ukgsV3BL
L44AnRhdSpqvMLtdvgKn7trmhcqUp0Kn
=obOA
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to