Hi,

We observe latency degradation with NIO2 compared to NIO1 for new
connections when the server is under stress.
I created https://github.com/apache/tomcat/pull/371 and will very
much appreciate getting feedback from you.  Please see more details below:

Under regular load, we observe NIO2 has better CPU utilization and slightly
better latency compared to NIO1. However, under heavy load or during
spikes, we observed that the 99th percentile latency of NIO2 is
significantly worse compared to NIO1.

   - We see that the latency is worse compared to NIO1 only for the
   requests that end up establishing a new connection.
   - When maxKeepAliveRequests=-1, we do not observe the latency
   degradation with NIO2 at all. But, we need and use Tomcat’s
   maxKeepAliveRequests feature, so setting it to -1 is not an option.
   - The problem occurs only during spikes/high load, e.g., maxThreads=40,
   while the concurrent users sending requests is 100.
   - The issue happens only for blocking scenarios, where Tomcat threads
   are blocked during request processing. When non-blocking programming is
   used (where Tomcat threads are freed quickly), we do not see any
   degradation compared to NIO1. However, we have many applications that would
   not be able to adopt non-blocking anytime soon.

While looking at Tomcat code, we saw that the accept is scheduled to be run
on the thread pool. Our hypothesis was that this scheduling was causing the
accept to be delayed significantly under heavy load. So, for a request with
new connection establishment, it needs to go through the TaskQueue twice.
So, just for testing purpose we set maxConnections=-1 to see the impact of
immediate accept call (and in our test the number of concurrent connections
does not get close to maxConnections), we observed that the 99th percentile
latency problem with NIO2 disappeared.

Accordingly, proposing a change to call accept immediately if the number of
connections are not close to maxConnections.

Thanks!

Reply via email to