-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Mike,
On 1/2/12 9:55 PM, Mike Wertheim wrote: > On Mon, Jan 2, 2012 at 3:09 PM, Christopher Schultz >> What were those 4 RUNNABLE threads doing? > > They were processing POST requests from the client with a somewhat > large data payload. > > Here's the stack from one of them: "http-apr-8080-exec-169" daemon > prio=10 tid=0x00002aaabc893000 nid=0x5253 runnable > [0x00000000451a9000] java.lang.Thread.State: RUNNABLE at > org.apache.tomcat.jni.Socket.recvbb(Native Method) at > org.apache.coyote.http11.InternalAprInputBuffer.fill(InternalAprInputBuffer.java:614) > > at org.apache.coyote.http11.InternalAprInputBuffer$SocketInputBuffer.doRead(InternalAprInputBuffer.java:656) > at > org.apache.coyote.http11.filters.IdentityInputFilter.doRead(IdentityInputFilter.java:118) > > at org.apache.coyote.http11.InternalAprInputBuffer.doRead(InternalAprInputBuffer.java:548) > at org.apache.coyote.Request.doRead(Request.java:422) at > org.apache.catalina.connector.InputBuffer.realReadBytes(InputBuffer.java:290) > > at org.apache.tomcat.util.buf.ByteChunk.substract(ByteChunk.java:429) > at > org.apache.catalina.connector.InputBuffer.read(InputBuffer.java:315) > > at org.apache.catalina.connector.CoyoteInputStream.read(CoyoteInputStream.java:200) > at > org.apache.catalina.connector.Request.readPostBody(Request.java:3096) > > at org.apache.catalina.connector.Request.parseParameters(Request.java:3049) > at > org.apache.catalina.connector.Request.getParameter(Request.java:1139) > > at org.apache.catalina.connector.RequestFacade.getParameter(RequestFacade.java:382) > at com.mycompany.UploadServlet.service(UploadServlet.java:29) Okay, that looks like a fairly-standard reading-the-POST-body operation. I don't think there's anything concerning in there. >>> For the port 8443 connector, there are 81 threads in "waiting >>> on condition" and 119 threads in "runnable". >> >> Well, that looks like it might be a problem. What about the >> stack traces for THOSE threads? >> > They mostly look like this: > > "http-apr-8443-exec-200" daemon prio=10 tid=0x00002aaac8132800 > nid=0x520f runnable [0x0000000044fba000] java.lang.Thread.State: > RUNNABLE at org.apache.tomcat.jni.SSLSocket.handshake(Native > Method) at > org.apache.tomcat.util.net.AprEndpoint.setSocketOptions(AprEndpoint.java:751) > > at org.apache.tomcat.util.net.AprEndpoint$SocketWithOptionsProcessor.run(AprEndpoint.java:1766) > - locked <0x000000067f3101e0> (a > org.apache.tomcat.util.net.SocketWrapper) at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > at java.lang.Thread.run(Thread.java:722) Uh, oh. If you have 81 threads stuck in that state, then something is probably broken with the SSL handshake. Do you have any suspicion of some kind of DOS attack or something? I have to imagine that the APR SSL code is fairly robust. I'd be interested to see just how "runnable" that thread is, given that Chuck points out that a "Java RUNNABLE" thread might mean it's actually blocked in native code somewhere. At least blocked threads don't burn CPU time, even though they might not be accomplishing anything useful. >> When you say "slowly grinds to a halt" do you mean that there's >> point where you can no longer accept HTTP requests? How about >> HTTPS? When you get stuck, what does 'top' show? Do the threads >> look idle or are you burning up your CPU? > > It definitely stops accepting some (but not all) http requests. > To answer the other questions, I'd have to run the test again (and > it takes a day to fail). That's odd that the HTTP responses stall when it looks like the HTTPS connector is the one suffering. That might point to thread starvation if you are experiencing spinning threads instead of blocked ones. But in your followup you said CPU was idle. Hmm. Can you reproduce this in a test environment? If so, lower the number of threads in each connector in order to trigger the problem sooner. So, instead of 200 threads per connector, use 20 and see if you can lock it up in less time. I suspect you can. If this is in production, I highly recommend that you find yourself a test box that you can abuse while investigating this issue. Any errors shown in any logs? Look at logs/*, not just catalina.out. Also, check syslog just in case there's something in there for whatever reason. Also, please follow the "7.0.22 + fd leak with APR/native" thread to see if any of that matches your observations. - -chris -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.17 (Darwin) Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk8E3RsACgkQ9CaO5/Lv0PCT6wCgncxtU2TDkvynbxm2ukgsV3BL L44AnRhdSpqvMLtdvgKn7trmhcqUp0Kn =obOA -----END PGP SIGNATURE----- --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org