-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Emmanuel,

On 10/4/19 16:38, Emmanuel Lecharny wrote:
> Hi remy,
>
> On 2019/10/04 15:37:36, Rémy Maucherat <r...@apache.org> wrote:
>> On Fri, Oct 4, 2019 at 3:40 PM Emmanuel Lecharny
>> <elecha...@apache.org> wrote:
>>
>>> Hi !
>>>
>>> I filled a ticket yesterday about a pb we face with many NIO
>>> framework, which I think could hit Tomcat too (see
>>> https://bz.apache.org/bugzilla/show_bug.cgi?id=63802).
>>> Actually, I think I'm facing this problem on a project I'm
>>> working on atm.
>>>
>>> Remy suggested we discuss it on this mailing list.
>>>
>>> Bottom line, what happens is that under some circumstances not
>>> well defined, the call to select() might end to an infinite
>>> loop eating all the CPU (select() returns 0, so select is
>>> immediately called again, and we loop).
>>>
>>> In various NIO framworks - and being a MINA committer, I have
>>> implemented the discussed workaround -, we are controlling this
>>> situation by breaking this infinite loop this way : - if the
>>> select() call returns 0 - then if we have called select() more
>>> than N times in less than M ms (N=10, M=100 in MINA) - then we
>>> create a new Selector, register all the selectionKey that were
>>> registered on the broken selector, and ditch the old selector.
>>>
>>> This workaround does not cost a lot when the selector works as
>>> designed, as a select() call should never return 0.
>>>
>>
>> There's actually a very similar hack for APR that has been placed
>> by myself a long time ago [
>> https://github.com/apache/tomcat/blob/master/java/org/apache/tomcat/u
til/net/AprEndpoint.java#L1410
>>
>>
], I don't even know if it's actually useful and it's certainly not
>> testable. Overall what it does is pretty terrible :(
>>
>> Personally I would like to know more about this "long lived bug
>> either in the JDK or even in Linux epoll implementation" like
>> actual platform details and JVM versions used since I've never
>> heard about it in the first place.
>
> for the record, I had a discussion yesterday with one of my close
> friend and co-worker back in the 90's. He remember clearly, while
> working on the SUN TCP stack,  that such a problem occorded back
> then. Yes, 25 years ago... Ok, that was just for the fun, it's
> likely be perfectly unrelated ;-)
>
> At MINA, we were hit by this bug in 2009 (see
> https://issues.apache.org/jira/browse/DIRMINA-678), and it was
> linked to a bug reported on Jetty
> (http://jetty.4.x6.nabble.com/jira-Created-JETTY-937-SelectChannelConn
ector-100-CPU-usage-on-Linux-td36385.html),
> itself related to some JDK bugs, supposedly fixed since then.
>
> I had a long conversation with Jean-François Arcand somewhere
> around this date, and he suggested we adopt the same workaround he
> applied to Grizzly. We also had a convo with Alan Bateman during a
> Java One in SF, but nothing specific resulted from this convo,
> except that AFAICR, he aknowledge there is an issue.
>
> So this problem started with JDK 6, but I can't guarantee it wasn't
> already present in JDK 5 or 4, on linux, and not on any other OS
> like windows or Mac OSX. It's not exactly fresh in my mind, because
> it was already 10 years ago.
>
>> Also I'd like to know since NIO2 doesn't expose its poller and
>> almost certainly doesn't have such a platform specific mysterious
>> thing inside it [we can check I guess].
>
> No idea, but I think NIO.2 has just added some coating over what
> was NIO.1 (guts feeling here...).
>
> In the context of NIO, do you have evidence the
>> hack has been tested to work (besides avoiding the CPU loop) and
>> allowed the server to continue its regular operation without any
>> impact ?
>
> Absolutely. We do log in MINA when a new selector is created, and
> we have had some issue related to a case where this piece of code
> was called, fixed since :
> https://issues.apache.org/jira/browse/DIRMINA-762?page=com.atlassian.j
ira.plugin.system.issuetabpanels%3Aall-tabpanel
>
>  So we definitively know that people get hit by the initial issue
> (select returns 0), a new selector is being created, and everything
> is fine from the user perspective (I do believe that creating the
> new selector and registering all the SelectionKey on it is not
> worse than having to restart the server manually...)
>
> In any case, Grizzly has probably the best possible approach to
> this problem: make the workaround optional.
>
> For Tomcat, I'm tempted to use the Http11AprProtocol class instead
> of the NIO one, as one can swap the protocol in the configuration,
> but the impact is that you need OpenSSL already installed on your
> machine. That would be an acceptable workaround in my case, but a
> painful one. A similar approach would be pleasant to have : a
> Http11NIONoSpinProtocol class that we can use if needed.

I'm inclined to just build this into the standard protocol class with
some good documentation explaining why the hack is in there. You will
never know you need it until you suddenly need it, and then it's too lat
e.

Is this only a problem when select() returns 0? That is... is there
really a reason to do the N times in M ms check? Can we simply replace
the Selector is select() ever returns 0? Or are there legitimate
use-cases for that return value under certain circumstances?

Instead of implementing N / M, why not simply maintain a counter of
"useless select()s" and then replace the Selector when the count gets
too high? Or, perhaps a tweak, something like this (psuedocode):

    int badness = 0;

    while(dontStop) {
        if(0 == select(..)) {
          badness++;

          if(badness > threshold) {
              // replace selector
          }
        } else {
          // do useful work

          badness = Math.min(0, badness - 1);
        }
    }

- -chris
-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl2Xy/UACgkQHPApP6U8
pFiVNw/9EFAX0D550z0lQuB1Nt8e0ob4pHyZqoQYuNd69kSpKhoEKjy8Dn7E761g
xN3LHSh6I+REL4iFntGmVOvao5sCtPezqZfUaXMBvIJOnVpSCfEJVtmXFnZVGwqD
3OCOG+2C+77iYpVjVuI4eEqjYZqwRolKYMtlTN5XDla4mO7at6hlp6INYToIiKjh
ELtStZTqQuurOORqb0K2HLY+rcUJAaJA69sXeWQTuvIgMDbK1nars2m0/n0Vzqw8
DjTEbPTNBAFge8LWSIfX1W2cvGS1gofDh2Hx+r5lBPPxANqHpToG3wo82Vp170tO
mUjtEMU91zeJfXVHHIHfZV2Nb8yvBJVd4MpDq/h/5xMyvTpRPT+1m5BasnhKApXp
gmGXs9nct0BSXpjniYYMSHuQY2A0//ctoOsVYumm+IwCFY6Bmhnw22VQ6koLlsah
QCgvkoDlxRx+0Dow/2LOwi/5lfTWT+Z2M1esLvOwXH2JEautCz5PZxCCxVxbUtEM
nTrQSlSRKIH07MufkRDt4Ft9NPRDoaQ3rab13js+YfrbY7G1nBlK2kNqeZ/trIaI
K3P82RKrtj3ZFdFa28D03CpPtzFIAujhMO+g4LfAMajJb3bHRNh44DL3VNvntNhR
+ym3hrmSj8nAZnrMciLYNF0EQuGufy6q8Oox6XPwoertZNrRnMY=
=mZdo
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to