Hi remy,

On 2019/10/04 15:37:36, Rémy Maucherat <r...@apache.org> wrote: 
> On Fri, Oct 4, 2019 at 3:40 PM Emmanuel Lecharny <elecha...@apache.org>
> wrote:
> > Hi !
> >
> > I filled a ticket yesterday about a pb we face with many NIO framework,
> > which I think could hit Tomcat too (see
> > https://bz.apache.org/bugzilla/show_bug.cgi?id=63802). Actually, I think
> > I'm facing this problem on a project I'm working on atm.
> >
> > Remy suggested we discuss it on this mailing list.
> >
> > Bottom line, what happens is that under some circumstances not well
> > defined, the call to select() might end to an infinite loop eating all the
> > CPU (select() returns 0, so select is immediately called again, and we
> > loop).
> >
> > In various NIO framworks - and being a MINA committer, I have implemented
> > the discussed workaround -, we are controlling this situation by breaking
> > this infinite loop this way :
> > - if the select() call returns 0
> > - then if we have called select() more than N times in less than M ms
> > (N=10, M=100 in MINA)
> > - then we create a new Selector, register all the selectionKey that were
> > registered on the broken selector, and ditch the old selector.
> >
> > This workaround does not cost a lot when the selector works as designed,
> > as a select() call should never return 0.
> >
> There's actually a very similar hack for APR that has been placed by myself
> a long time ago [
> https://github.com/apache/tomcat/blob/master/java/org/apache/tomcat/util/net/AprEndpoint.java#L1410
> ], I don't even know if it's actually useful and it's certainly not
> testable. Overall what it does is pretty terrible :(
> Personally I would like to know more about this "long lived bug either in
> the JDK or even in Linux epoll implementation" like actual platform details
> and JVM versions used since I've never heard about it in the first place.

for the record, I had a discussion yesterday with one of my close friend and 
co-worker back in the 90's. He remember clearly, while working on the SUN TCP 
stack,  that such a problem occorded back then. Yes, 25 years ago... Ok, that 
was just for the fun, it's likely be perfectly unrelated ;-)

At MINA, we were hit by this bug in 2009 (see 
https://issues.apache.org/jira/browse/DIRMINA-678), and it was linked to a bug 
reported on Jetty 
 itself related to some JDK bugs, supposedly fixed since then.

I had a long conversation with Jean-François Arcand somewhere around this date, 
and he suggested we adopt the same workaround he applied to Grizzly. We also 
had a convo with Alan Bateman during a Java One in SF, but nothing specific 
resulted from this convo, except that AFAICR, he aknowledge there is an issue.

So this problem started with JDK 6, but I can't guarantee it wasn't already 
present in JDK 5 or 4, on linux, and not on any other OS like windows or Mac 
OSX. It's not exactly fresh in my mind, because it was already 10 years ago.

> Also I'd like to know since NIO2 doesn't expose its poller and almost
> certainly doesn't have such a platform specific mysterious thing inside it
> [we can check I guess]. 

No idea, but I think NIO.2 has just added some coating over what was NIO.1 
(guts feeling here...).

In the context of NIO, do you have evidence the
> hack has been tested to work (besides avoiding the CPU loop) and allowed
> the server to continue its regular operation without any impact ?

Absolutely. We do log in MINA when a new selector is created, and we have had 
some issue related to a case where this piece of code was called, fixed since : 

So we definitively know that people get hit by the initial issue (select 
returns 0), a new selector is being created, and everything is fine from the 
user perspective (I do believe that creating the new selector and registering 
all the SelectionKey on it is not worse than having to restart the server 

In any case, Grizzly has probably the best possible approach to this problem: 
make the workaround optional. 

For Tomcat, I'm tempted to use the Http11AprProtocol class instead of the NIO 
one, as one can swap the protocol in the configuration, but the impact is that 
you need OpenSSL already installed on your machine. That would be an acceptable 
workaround in my case, but a painful one. A similar approach would be pleasant 
to have : a Http11NIONoSpinProtocol class that we can use if needed.



To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to