Hi Mark, I have updated the Tomcat's version to 9.0.21(Docker image tag is tomcat:9.0.21-jdk8. Sorry for my word 'official', it is actually built by Docker). The Tomcat Native's version is 1.2.21. It is built from the tomcat-native.tar.gz, which is provided in the tomcat 9.0.21 distribution.
I have tested again with the same configuration and the same steps. The results is the same as I mentioned above. After a single "keep-alive" request was sent, a next following "non-keep-alive" request didn't receive the response in a long time(TCP connection was established). I read the source code of 9.0.21 roughly. It seems that the mechanism is not changed. The Poller still has a chance to use a large number of "nextPollerTime" to poll events. Best Regards, Chang Liu Mark Thomas <ma...@apache.org> 于2019年7月3日周三 下午7:39写道: > On 03/07/2019 10:59, Sanford Liu wrote: > > Hi Team, > > My team are facing a no responding issue in the below circumstances: > > > > 1. Env: > > ApacheTomcat:8.5.15, JDK: 1.8.0_121 > > That Tomcat version is more than 2 years old. > > > 2. Tomcat configuration: > > enable APR: protocol="org.apache.coyote.http11.Http11AprProtocol" > > set maxThreads of the Executor: maxThreads="1200" > > Which version of Tomcat Native are you using. > > > 3. This web server was under a massive load. > > All requests were HTTP 1.1 requests and were marked with a "Connection: > > close" HTTP header. > > At this point, web server showed some latency for the responses, but it > is > > still running. > > > > 4. Some "keep-alive" requests were coming. > > Those requests were marked with a "Connection: keep-alive" HTTP header. > > > > 5. The following non-keep-alive requests was not responding for a long > time. > > We run the thread dump with jstack at this time, we saw Acceptor thread > is > > WAITING: > > <snip/> > > > But The Poller thread is RUNNING: > > "http-apr-8080-Poller@5809" daemon prio=5 tid=0x25 nid=NA runnable > > java.lang.Thread.State: RUNNABLE > > at org.apache.tomcat.jni.Poll.poll(Poll.java:-1) > > <snip/> > > > All the Executor threads are WAITING like this: > > "http-apr-8080-exec-21@6078" daemon prio=5 tid=0x3f nid=NA waiting > > java.lang.Thread.State: WAITING > > at sun.misc.Unsafe.park(Unsafe.java:-1) > > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) > > <snip/> > > > We dived in the source code, found some clues: > > > > 1. Acceptor thread is parking by the LimitLatch. It means LimitLatch is > not > > released by other occupations, which also means the previous connections > do > > not perform the close operation in our circumstances. > > (AprEndpoint.java#L955) > > There have been some bugs fixed in this area since 8.5.15. > > > 2. The close logic is take place in the Poller thread. > > (AprEndpoint.java#L1624) > > > > 3. If the polling logic takes lot of time, the Poller thread will be > > blocked(although it is still running, it could be blocked by the native > > method), and the > > destroySocket method will be suspended. (AprEndpoint.java#L1680) > > > > 4. Because the Acceptor processes the new connection directly(not > registers > > to the poller, AprEndpoint.java#L2268). So the pollerSpace[i] always > equals > > to actualPollerSize at this case(AprEndpoint.java#L1679), the > > "nextPollerTime" will be increased so large. But when some "keep-alive" > > requests arrive, the Handler implementation will process those > connections > > and register each back to the poller (AbstractProtocol.java#L933), so the > > pollerSpace is changed, and the Poller will use a large value of > > "nextPollerTime" to poll the socket, so the Poller thread would blocked > in > > a long time. > > > > To prove that, we setup a similar environment to reproduce this issue: > > > > 1. Use official tomcat docker image(tomcat:8.5.15) to run a simple http > > server > > Note: There is no ASF provided official docker image. If 8.5.15 is the > latest version provided by Docker I'd strongly recommend that you use a > more up to date version of Tomcat. > > > 2. Change the config: > > <Connector > > port="8080" > > protocol="org.apache.coyote.http11.Http11AprProtocol" > > connectionTimeout="20000" > > redirectPort="8443" > > maxConnections="20" /> // make testing easy to reached max > connections > > > > 3. Keep sending lots of "non-keep-alive" requests in 5 min > > $ wrk2 -t8 -c32 -d10m -R128 -s ./closed.lua > > http://127.0.0.1:8080/hello?latency=10 > > > > 4. Send a single "keep-alive" request and do not close this connection on > > client side > > > > 5. After that, send another "non-keep-alive" request. We can see no > > response returned in a reasonable time(waiting in 30 sec). > > > > A workaround: > > By set deferAccept="false" for the connector configuration, we can force > > Acceptor to register the new connection to the > > poller(AprEndpoint.java#L2250), and "nextPollerTime" will not lose > control. > > > > So, is that a real issue for Tomcat? > > It does look like there is an edge case here that isn't handled correctly. > > The use of multiple pollers stems from a work-around for Windows > platforms that are now obsolete. I thought we discussed removing that > code. I need to find that discussion and remind myself of the conclusion. > > Mark > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > For additional commands, e-mail: users-h...@tomcat.apache.org > >