Hello everyone,

 

We have a high load environment where we are running tomcat 5.5.15
successfully. We are interesting in reducing the system CPU load and
switching to Tomcat 6 with NIO, but so far have run into a few issues.
After trying out the patch Filip recommended

(http://svn.apache.org/viewvc?view=rev&revision=618420) we did get
further (our CPUs were not pegged any more and exceptions went away),
but it appears that about 2/3rds of the requests are being dropped. I
was going to file a bug, but I thought I'd post it first to see if
anyone has any ideas.

 

We have many tomcat servers running 5.5.15. I took one of the machines,
and put 6.0.16 on it, patched it. Each server gets around 3K requests a
minute during the time window when I saw doing my experiments. Using
6.0.16 regular HTTP connections, I do get around 3K requests per minute
that we process. With NIO, we only see around 1.2K per minute. Here are
the two connector configurations.

 

Regular config

    <Connector port="8000"

               maxThreads="500" minSpareThreads="25"
maxSpareThreads="75"

               enableLookups="false" redirectPort="443"
acceptCount="100"

               compression="on" 

               compressionMinSize="2048" 

               noCompressionUserAgents="gozilla, traviata" 

               compressableMimeType="text/html,text/xml"

               connectionTimeout="0" disableUploadTimeout="true" 

               maxHttpHeaderSize="8192" />

 

NIO config

    <Connector port="8000"
protocol="org.apache.coyote.http11.Http11NioProtocol"
acceptorThreadCount="4"

               maxThreads="500" minSpareThreads="25"
maxSpareThreads="75"

               enableLookups="false" redirectPort="443"
acceptCount="100"

               compression="on" 

               compressionMinSize="2048" 

               noCompressionUserAgents="gozilla, traviata" 

               compressableMimeType="text/html,text/xml"

               connectionTimeout="0" disableUploadTimeout="true" 

               maxHttpHeaderSize="8192" />

 

I tried to tune various NIO parameters and had no luck. I can only
assume that acceptors turn away some requests for some reason without
passing them down to the worker threads. Or that I have mis-configured
something.

 

Here are some interesting threads

 

A worker thread

http-8000-exec-20
52   TIMED_WAITING  20000000    

.....omited....

javax.servlet.http.HttpServlet.service(690)


javax.servlet.http.HttpServlet.service(803)


org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(290)


org.apache.catalina.core.ApplicationFilterChain.doFilter(206)


org.apache.catalina.core.StandardWrapperValve.invoke(233)


org.apache.catalina.core.StandardContextValve.invoke(175)


org.apache.catalina.authenticator.AuthenticatorBase.invoke(433)


org.apache.catalina.core.StandardHostValve.invoke(128)


org.apache.catalina.valves.ErrorReportValve.invoke(102)


org.apache.catalina.valves.AccessLogValve.invoke(568)


org.apache.catalina.core.StandardEngineValve.invoke(109)


org.apache.catalina.connector.CoyoteAdapter.service(286)


org.apache.coyote.http11.Http11NioProcessor.process(879)


org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.proce
ss(719)


org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(2080)


java.util.concurrent.ThreadPoolExecutor$Worker.runTask(885)


java.util.concurrent.ThreadPoolExecutor$Worker.run(907)


java.lang.Thread.run(619)


 

Idle worker thread

http-8000-exec-1
21   TIMED_WAITING  50000000
[EMAIL PROTECTED]
aa65fe  -1           41            -1           17           -1

sun.misc.Unsafe.park(-2)


java.util.concurrent.locks.LockSupport.parkNanos(198)


java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.aw
aitNanos(1963)


java.util.concurrent.LinkedBlockingQueue.poll(395)


java.util.concurrent.ThreadPoolExecutor.getTask(944)


java.util.concurrent.ThreadPoolExecutor$Worker.run(906)


java.lang.Thread.run(619)


 

NIO endpoint


http-8000-ClientPoller
20   RUNNABLE       370000000
-1           285           -1           8            -1

sun.nio.ch.EPollArrayWrapper.epollWait(-2)


sun.nio.ch.EPollArrayWrapper.poll(215)


sun.nio.ch.EPollSelectorImpl.doSelect(65)


sun.nio.ch.SelectorImpl.lockAndDoSelect(69)


sun.nio.ch.SelectorImpl.select(80)


org.apache.tomcat.util.net.NioEndpoint$Poller.run(1473)


java.lang.Thread.run(619)


 

My 4 acceptors


http-8000-Acceptor-3
19   BLOCKED        50000000    [EMAIL PROTECTED]
16           71            -1           0            -1

sun.nio.ch.ServerSocketChannelImpl.accept(129)


org.apache.tomcat.util.net.NioEndpoint$Acceptor.run(1198)


java.lang.Thread.run(619)


 


http-8000-Acceptor-2
18   BLOCKED        30000000    [EMAIL PROTECTED]
16           48            -1           0            -1

sun.nio.ch.ServerSocketChannelImpl.accept(129)


org.apache.tomcat.util.net.NioEndpoint$Acceptor.run(1198)


java.lang.Thread.run(619)


 


http-8000-Acceptor-1
17   BLOCKED        40000000    [EMAIL PROTECTED]
16           51            -1           0            -1

sun.nio.ch.ServerSocketChannelImpl.accept(129)


org.apache.tomcat.util.net.NioEndpoint$Acceptor.run(1198)


java.lang.Thread.run(619)


 


http-8000-Acceptor-0
16   RUNNABLE       80000000
-1           59            -1           0            -1

sun.nio.ch.ServerSocketChannelImpl.accept0(-2)


sun.nio.ch.ServerSocketChannelImpl.accept(145)


org.apache.tomcat.util.net.NioEndpoint$Acceptor.run(1198)


java.lang.Thread.run(619)  

 

My workers go up all the way to http-8000-exec-272. It looked like a
handful of them were doing work, but must were idle. Our servlet
requests are very short lived - 30ms tops.

 

Our application works with normal HTTP polling socket connector, but we
would like to use NIO because it would reduce the load on each tomcat
node, and let us scale better. I am very interested in trying any
suggestions or patches you might have. We have a unique scalability
environment for Tomcat - many unique requests from the internet at a
pretty fast pace. If NIO can be made work in our environment, it would
be a major milestone of stability and performance for it. It is also
possibly a configuration issue. If anyone has any thoughts/comments,
please post.

 

Thanks,

-Emile

 

 

Reply via email to