Re: Tomcat 9.0.29 - HTTPS threads age, max connections reached, Tomcat not responding on 8443

2020-08-28 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

David,

On 8/27/20 18:14, David wrote:
>> I used the http to 8080 in order to read the Tomcat webmanager
>> stats.   I originally had issues with the JVM being too small,
>> running out of memory, CPU spiking, threads maxing out, and
>> whole system instability.  Getting more machine memory and upping
>> the JVM allocation has remedied all of that except for apparently
>> the thread issue.
What is the memory size of the server and of the JVM?

>> I'm unsure that they were aging at that time as I couldn't get
>> into anything, but with no room for GC to take place it would
>> make sense that the threads would not be released.

That's not usually an issue, unless the application is being a
significant amount of memory during a request and then releasing it
after the request has completed.

>> My intention was to restart Tomcat nightly to lessen the chance
>> of an occurrence until I could find a way to restart Tomcat based
>> on the thread count and script a thread dump at the same time,
>> (likely through Solarwinds).  Now that you've explained that the
>> NIO threads are a part of the system threads, I may be able to
>> script something like that directly on the system, with a
>> chrontab to check count, if
> 295 contains NIO dump thread to / systemctl stop-start tomcat.

I wouldn't do that. Just because the threads exist does not mean they
are stuck. They may be doing useful work or otherwise running just
fine. I would look for other ways to detect problems.

>> That's very warming as it seems a viable way to get the data I
>> need without posing much impact to users.   Your explanation of
>> threads leads me to believe that the nightly restart may be
>> rather moot as it could likely be exhaustion on the downstream
>> causing the backup on the front end.  I didn't see these
>> connected in this way and assumed they were asynchronous and
>> independent processes.  There are timeouts configured for all the
>> DB2 backend connections, and I was in the mindset of the least
>> timeout would kill all connections upstream/downstream by
>> presenting the application a forcibly closed by remote host or a
>> timeout.

If you can suffer through a few more incidents, you can probably get a
LOT more information about the root problem and maybe even get it
solved, instead of just trying to stop the bleeding.

>> I greatly appreciate the assistance, In looking through various
>> articles none of this was really discussed because either
>> everyone knows it, or maybe it was discussed on a level where I
>> couldn't understand it, there certainly doesn't seem to be any
>> other instances of connections being open for 18-45minutes or if
>> there is it's not an issue for them.

If you have a load-balancer (which you do), then I'd expect HTTP
keep-alived to keep those connections open literally all day long,
only maybe expiring when you have configured them to expire "just in
case" or maybe after some amount of inactivity. For a lb-environment,
I'd want those keep-alive timeouts to be fairly high so you don't
waste any time re-constructing sockets between the lb and the app server
.

When an lb is NOT in the mix, you generally want /low/ keep-alive
timeouts because you can't rely on clients sticking around for very
long and you want to get them off your doorstep ASAP.

>> During a normal glance at the manager page, there are no
>> connections and maybe like 5 empty lines in a "Ready" stage,
>> even if I spam the server's logon landing page I can never see a
>> persistent connection, so it baffled me as to how connections
>> could hang and build up, so I'm thinking something was perhaps
>> messed up with the backend.

If by "backend" you mean like databasse, etc. then that is probably
the issue. The login page is (realtively) static, so it's very
difficult to put Tomcat under such load that it's hosing just giving
you that same page over and over again.

I don't know what your "spamming" strategy is, but you might want to
use a real load-generating tool like ApacheBench (ab) or, even better,
JMeter which can actually swarm among several machines to basically
DDoS your internal servers, which can be useful sometimes for
stress-testing. But your tests really do have to comprise a realistic
scenario, not just hammering on the login page all day.

>> The webapp names /URL's for the oldest connections didn't
>> coincide between the two outages, so I kind of brushed it off as
>> being application specific, however it may still be.
>
>> I need it to occur again and get some dumps!

Unfortunately, yes.

- -chris
-BEGIN PGP SIGNATURE-
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl9JYLIACgkQHPApP6U8
pFgE6w//YnYR85ETrZJ6jvV0+jGM0qHeZeIz7tP49MZfp5fczPoYb93vrxeQ8W2T
TaoiJCYpyN37w9IAZo4cxuIGaaF/j10OY2sLAqB+Ogu6FRYXmWLvzqkO+fpX6Kw+
/KKjl3cru0XKQqYpYXpfAl99G0EAFOAeT8r43guBjeF5vyqfZyaQJC/YyfEfFL9R

Re: Tomcat 9.0.29 - HTTPS threads age, max connections reached, Tomcat not responding on 8443

2020-08-27 Thread David
On Thu, Aug 27, 2020 at 4:30 PM Christopher Schultz
 wrote:
>
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
>
> David,
>
> On 8/27/20 17:14, David wrote:
> > Thank you all for the replies!
> >
> > On Thu, Aug 27, 2020 at 3:53 PM Christopher Schultz
> >  wrote:
> >>
> > David,
> >
> > On 8/27/20 13:57, David wrote:
>  On Thu, Aug 27, 2020 at 12:35 PM Christopher Schultz
>   wrote:
> >
>  David,
> 
>  On 8/27/20 10:48, David wrote:
> >>> In the last two weeks I've had two occurrences where a
> >>> single CentOS 7 production server hosting a public
> >>> webpage has become unresponsive. The first time, all
> >>> 300 available "https-jsse-nio-8443" threads were
> >>> consumed, with the max age being around 45minutes, and
> >>> all in a "S" status. This time all 300 were consumed in
> >>> "S" status with the oldest being around ~16minutes. A
> >>> restart of Tomcat on both occasions freed these threads
> >>> and the website became responsive again. The
> >>> connections are post/get methods which shouldn't take
> >>> very long at all.
> >>>
> >>> CPU/MEM/JVM all appear to be within normal operating
> >>> limits. I've not had much luck searching for articles
> >>> for this behavior nor finding remedies. The default
> >>> timeout values are used in both Tomcat and in the
> >>> applications that run within as far as I can tell.
> >>> Hopefully someone will have some insight on why the
> >>> behavior could be occurring, why isn't Tomcat killing
> >>> the connections? Even in a RST/ACK status, shouldn't
> >>> Tomcat terminate the connection without an ACK from the
> >>> client after the default timeout?
> 
>  Can you please post:
> 
>  1. Complete Tomcat version
> > I can't find anything more granular than 9.0.29, is there
> > a command to show a sub patch level?
> >
> > 9.0.29 is the patch-level, so that's fine. You are about 10
> > versions out of date (~1 year). Any chance for an upgrade?
> >
> >> They had to re-dev many apps last year when we upgraded from I
> >> want to say 1 or 3 or something equally as horrific.  Hopefully
> >> they are forward compatible with the newer releases and if not
> >> should surely be tackled now before later, I will certainly bring
> >> this to the table!
>
> I've rarely been bitten by an upgrade from foo.bar.x to foo.bar.y.
> There is a recent caveat if you are using the AJP connector, but you
> are not so it's not an issue for you.
>
>  2. Connector configuration (possibly redacted)
> > This is the 8443 section of the server.xml *8080 is
> > available during the outage and I'm able to curl the
> > management page to see the 300 used threads, their status,
> > and age* 
> >
> > [snip]
> >
> >  > connectionTimeout="2" redirectPort="8443" /> [snip]
> >  > protocol="org.apache.coyote.http11.Http11NioProtocol"
> > maxThreads="300" SSLEnabled="true" > 
> >  > certificateKeystoreFile="/opt/apache-tomcat-9.0.29/redacted.jks"
> >
> >
> certificateKeystorePassword="redacted" type="RSA" />
> >   [snip]  > port="8443"
> > protocol="org.apache.coyote.http11.Http11NioProtocol"
> > maxThreads="300" SSLEnabled="true" >  > protocols="TLSv1.2">  > certificateKeystoreFile="/opt/apache-tomcat-9.0.29/redacted.jks"
> >
> >
> certificateKeystorePassword="redacted" type="RSA" />
> >  
> >
> > What, two connectors on one port? Do you get errors when starting?
> >> No errors, one is "with HTTP2" should I delete the other former?
>
> Well, one of them will succeed in starting the and other one should
> fail. Did you copy/paste your config without modification? Weird you
> don't have any errors. Usually you'll get an IOException or whatever
> binding to the port twice.

I do recall IOExceptions and "port already in use" errors that caused
Tomcat to not start, but I think these were related to syntax errors
when defining catalina variables for my JVM sizing.  I'll take another
look at catalina.out and ensure I don't still see these, and will
likely clean up the non "with http2" connector out of the config
regardless. The only edits to the section of the supplied xml were the
.jks store name and pw.
>
> > I don't see anything obviously problematic in the above
> > configuration (other than the double-definition of the 8443
> > connector).
> >
> > 300 tied-up connections (from your initial report) sounds like a
> > significant number: probably the thread count.
> >> Yes sir, that's the NIO thread count for the 8443 connector.
> >
> > Mark (as is often the case) is right: take some thread dumps next
> > time everything locks up and see what all those threads are doing.
> > Often, it's something like everything is awaiting on a db
> > connection and the db pool has been exhausted or something.
> > Relatively simple quick-fixes are available for that, and better,
> > 

Re: Tomcat 9.0.29 - HTTPS threads age, max connections reached, Tomcat not responding on 8443

2020-08-27 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

David,

On 8/27/20 17:14, David wrote:
> Thank you all for the replies!
>
> On Thu, Aug 27, 2020 at 3:53 PM Christopher Schultz
>  wrote:
>>
> David,
>
> On 8/27/20 13:57, David wrote:
 On Thu, Aug 27, 2020 at 12:35 PM Christopher Schultz
  wrote:
>
 David,

 On 8/27/20 10:48, David wrote:
>>> In the last two weeks I've had two occurrences where a
>>> single CentOS 7 production server hosting a public
>>> webpage has become unresponsive. The first time, all
>>> 300 available "https-jsse-nio-8443" threads were
>>> consumed, with the max age being around 45minutes, and
>>> all in a "S" status. This time all 300 were consumed in
>>> "S" status with the oldest being around ~16minutes. A
>>> restart of Tomcat on both occasions freed these threads
>>> and the website became responsive again. The
>>> connections are post/get methods which shouldn't take
>>> very long at all.
>>>
>>> CPU/MEM/JVM all appear to be within normal operating
>>> limits. I've not had much luck searching for articles
>>> for this behavior nor finding remedies. The default
>>> timeout values are used in both Tomcat and in the
>>> applications that run within as far as I can tell.
>>> Hopefully someone will have some insight on why the
>>> behavior could be occurring, why isn't Tomcat killing
>>> the connections? Even in a RST/ACK status, shouldn't
>>> Tomcat terminate the connection without an ACK from the
>>> client after the default timeout?

 Can you please post:

 1. Complete Tomcat version
> I can't find anything more granular than 9.0.29, is there
> a command to show a sub patch level?
>
> 9.0.29 is the patch-level, so that's fine. You are about 10
> versions out of date (~1 year). Any chance for an upgrade?
>
>> They had to re-dev many apps last year when we upgraded from I
>> want to say 1 or 3 or something equally as horrific.  Hopefully
>> they are forward compatible with the newer releases and if not
>> should surely be tackled now before later, I will certainly bring
>> this to the table!

I've rarely been bitten by an upgrade from foo.bar.x to foo.bar.y.
There is a recent caveat if you are using the AJP connector, but you
are not so it's not an issue for you.

 2. Connector configuration (possibly redacted)
> This is the 8443 section of the server.xml *8080 is
> available during the outage and I'm able to curl the
> management page to see the 300 used threads, their status,
> and age* 
>
> [snip]
>
>  connectionTimeout="2" redirectPort="8443" /> [snip]
>  protocol="org.apache.coyote.http11.Http11NioProtocol"
> maxThreads="300" SSLEnabled="true" > 
>  certificateKeystoreFile="/opt/apache-tomcat-9.0.29/redacted.jks"
>
>
certificateKeystorePassword="redacted" type="RSA" />
>   [snip]  port="8443"
> protocol="org.apache.coyote.http11.Http11NioProtocol"
> maxThreads="300" SSLEnabled="true" >  protocols="TLSv1.2">  certificateKeystoreFile="/opt/apache-tomcat-9.0.29/redacted.jks"
>
>
certificateKeystorePassword="redacted" type="RSA" />
>  
>
> What, two connectors on one port? Do you get errors when starting?
>> No errors, one is "with HTTP2" should I delete the other former?

Well, one of them will succeed in starting the and other one should
fail. Did you copy/paste your config without modification? Weird you
don't have any errors. Usually you'll get an IOException or whatever
binding to the port twice.

> I don't see anything obviously problematic in the above
> configuration (other than the double-definition of the 8443
> connector).
>
> 300 tied-up connections (from your initial report) sounds like a
> significant number: probably the thread count.
>> Yes sir, that's the NIO thread count for the 8443 connector.
>
> Mark (as is often the case) is right: take some thread dumps next
> time everything locks up and see what all those threads are doing.
> Often, it's something like everything is awaiting on a db
> connection and the db pool has been exhausted or something.
> Relatively simple quick-fixes are available for that, and better,
> longer-term fixes as well.
>
>> Mark/Chris  is there a way to dump the connector threads
>> specifically? Or simply is it all contained as a machine/process
>> thread?  Sorry I'm not really a Linux guy.

Most of the threads in the server will be connector threads. They will
have names like https-nio-[port]-exec-[number].

If you get a thread dump[1], you'll get a stack trace from every thread.

Rainer wrote a great presentation about them in the context of Tomcat.
Feel free to give it a read:
http://home.apache.org/~rjung/presentations/2018-06-13-ApacheRoadShow-Ja
vaThreadDumps.pdf

 Do you have a single F5 or a group of them?
> A group of them, several HA pairs depending on internal or
> external 

Re: Tomcat 9.0.29 - HTTPS threads age, max connections reached, Tomcat not responding on 8443

2020-08-27 Thread David
Thank you all for the replies!

On Thu, Aug 27, 2020 at 3:53 PM Christopher Schultz
 wrote:
>
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
>
> David,
>
> On 8/27/20 13:57, David wrote:
> > On Thu, Aug 27, 2020 at 12:35 PM Christopher Schultz
> >  wrote:
> >>
> > David,
> >
> > On 8/27/20 10:48, David wrote:
>  In the last two weeks I've had two occurrences where a
>  single CentOS 7 production server hosting a public webpage
>  has become unresponsive. The first time, all 300 available
>  "https-jsse-nio-8443" threads were consumed, with the max age
>  being around 45minutes, and all in a "S" status. This time
>  all 300 were consumed in "S" status with the oldest being
>  around ~16minutes. A restart of Tomcat on both occasions
>  freed these threads and the website became responsive again.
>  The connections are post/get methods which shouldn't take
>  very long at all.
> 
>  CPU/MEM/JVM all appear to be within normal operating limits.
>  I've not had much luck searching for articles for this
>  behavior nor finding remedies. The default timeout values are
>  used in both Tomcat and in the applications that run within
>  as far as I can tell. Hopefully someone will have some
>  insight on why the behavior could be occurring, why isn't
>  Tomcat killing the connections? Even in a RST/ACK status,
>  shouldn't Tomcat terminate the connection without an ACK from
>  the client after the default timeout?
> >
> > Can you please post:
> >
> > 1. Complete Tomcat version
> >> I can't find anything more granular than 9.0.29, is there a
> >> command to show a sub patch level?
>
> 9.0.29 is the patch-level, so that's fine. You are about 10 versions
> out of date (~1 year). Any chance for an upgrade?

They had to re-dev many apps last year when we upgraded from I want to
say 1 or 3 or something equally as horrific.  Hopefully they are
forward compatible with the newer releases and if not should surely be
tackled now before later, I will certainly bring this to the table!
>
> > 2. Connector configuration (possibly redacted)
> >> This is the 8443 section of the server.xml *8080 is available
> >> during the outage and I'm able to curl the management page to see
> >> the 300 used threads, their status, and age*  >> name="Catalina">
> >>
> >> [snip]
> >>
> >>  >> connectionTimeout="2" redirectPort="8443" /> [snip]
> >>  >> protocol="org.apache.coyote.http11.Http11NioProtocol"
> >> maxThreads="300" SSLEnabled="true" > 
> >>  >> certificateKeystoreFile="/opt/apache-tomcat-9.0.29/redacted.jks"
> >> certificateKeystorePassword="redacted" type="RSA" />
> >>   [snip]  >> protocol="org.apache.coyote.http11.Http11NioProtocol"
> >> maxThreads="300" SSLEnabled="true" >  >> protocols="TLSv1.2">  >> certificateKeystoreFile="/opt/apache-tomcat-9.0.29/redacted.jks"
> >> certificateKeystorePassword="redacted" type="RSA" />
> >>  
>
> What, two connectors on one port? Do you get errors when starting?
No errors, one is "with HTTP2" should I delete the other former?
>
> I don't see anything obviously problematic in the above configuration
> (other than the double-definition of the 8443 connector).
>
> 300 tied-up connections (from your initial report) sounds like a
> significant number: probably the thread count.
Yes sir, that's the NIO thread count for the 8443 connector.
>
> Mark (as is often the case) is right: take some thread dumps next time
> everything locks up and see what all those threads are doing. Often,
> it's something like everything is awaiting on a db connection and the
> db pool has been exhausted or something. Relatively simple quick-fixes
> are available for that, and better, longer-term fixes as well.
>
Mark/Chris  is there a way to dump the connector threads specifically?
 Or simply is it all contained as a machine/process thread?  Sorry I'm
not really a Linux guy.

> > Do you have a single F5 or a group of them?
> >> A group of them, several HA pairs depending on internal or
> >> external and application.  This server is behind one HA pair and
> >> is a single server.
>
> Okay. Just remember that each F5 can make some large number of
> connections to Tomcat, so you need to make sure you can handle them.
>
> This was a much bigger deal back in the BIO days when thread limit =
> connection limit, and the thread limit was usually something like 250
> - - 300. NIO is much better, and the default connection limit is 10k
> which "ought to be enough for anyone"[1].
(lol)

I'm more used to the 1-1 of the BIO style, which kinda confused me
when I asked the F5 to truncate >X connections and alert me and there
were 600+ connections while Tomcat manager stated ~30.  Then I read
what the non-interrupt was about.
>
>
>
> [1] With apologies to Bill gates, who apparently never said anything
> of the sort.

Thanks again,
David
> -BEGIN PGP SIGNATURE-
> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
>
> 

Re: Tomcat 9.0.29 - HTTPS threads age, max connections reached, Tomcat not responding on 8443

2020-08-27 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Felix,

On 8/27/20 16:09, Felix Schumacher wrote:
>
> Am 27.08.20 um 19:35 schrieb Christopher Schultz:
>> David,
>>
>> On 8/27/20 10:48, David wrote:
>>> In the last two weeks I've had two occurrences where a single
>>> CentOS 7 production server hosting a public webpage has become
>>> unresponsive. The first time, all 300 available
>>> "https-jsse-nio-8443" threads were consumed, with the max age
>>> being around 45minutes, and all in a "S" status. This time all
>>> 300 were consumed in "S" status with the oldest being around
>>> ~16minutes. A restart of Tomcat on both occasions freed these
>>> threads and the website became responsive again. The
>>> connections are post/get methods which shouldn't take very long
>>> at all.
>>
>>> CPU/MEM/JVM all appear to be within normal operating limits.
>>> I've not had much luck searching for articles for this behavior
>>> nor finding remedies. The default timeout values are used in
>>> both Tomcat and in the applications that run within as far as I
>>> can tell. Hopefully someone will have some insight on why the
>>> behavior could be occurring, why isn't Tomcat killing the
>>> connections? Even in a RST/ACK status, shouldn't Tomcat
>>> terminate the connection without an ACK from the client after
>>> the default timeout?
>>
>> Can you please post:
>>
>> 1. Complete Tomcat version 2. Connector configuration (possibly
>> redacted)
>>
>>> Is there a graceful way to script the termination of threads
>>> in case Tomcat isn't able to for whatever reason?
>>
>> Not really.
>
> (First look at Marks response on determining the root cause)
>
> Well, there might be a way (if it is sane, I don't know). You can
> configure a valve to look for seemingly stuck threads and try to
> interrupt them:
>
> http://tomcat.apache.org/tomcat-9.0-doc/config/valve.html#Stuck_Thread
_Detection_Valve
>
>  There are a few caveats there. First it is only working, when
> both conditions are true
>
> * the servlets are synchronous * the stuck thread can be "freed"
> with an Interrupt
>
> But really, if your threads are stuck for more than 15 minutes, you
> have ample of time to take a thread dump and hopefully find the
> root cause, so that you don't need this valve.

This is a good idea as a band-aid, but the reality is that if you need
the StuckThreadDetectionValve then your application is probably broken
and needs to be fixed.

Here are things that can be broken which might cause thread exhaustion:

1. Poor resource management. Things like db connections pools which
can leak and/or not be refilled by the application. Everything stops
when the db pool dries up.

2. Failure to set proper IO timeouts. Guess what the default read
timeout is on a socket? Forever! If you read from a socket you might
never hear back. Sounds like a problem. Set your read timeouts, kids.
You might need to do this on your HTTP connections (and pools, and
factories, and connection-wrappers like Apache http-client), your
database config (usually in the config URL), and any remote-API
libraries you are using (which use e.g. HTTP under the hood).

- -chris
-BEGIN PGP SIGNATURE-
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl9IICoACgkQHPApP6U8
pFgkuQ/+NE7tC+wWXoP2Ntv0yljJHyasRY/3dVewoNUxfO4CwcEkhSpK5YEkiHd3
sE7jygxEn3SHtHJ0WQPBWMAzL9RoLnglbAsxVXuWCzbQzd3tGCKOxQevCN3y+2ft
jffqMEqOCgrN4kvKivj75V3alFQotT+jbZm1nJEwuQCLSJCqiHWcyCLlJF9Y6axn
Thvsv40bnTKCPgqezo/0AYiYjQ9xIatTC3QDw129E7bofNKPBLk7LWcbg9CQBu+T
iboA8IIxFgrOFYn66Mgx4kcJcQTRJ2XgdJ1v8p+mSITWH3UkLa5OhZeTqU6x2LDl
LPuY8eC6y9QUqpFeEtaL72ZpDdYAn7Vcu4B3+D4Oobh7o2EJNQijIQ6A2QKIFw6e
eBACKL0JJMwvfxVnp3nKIuoB3yOemMGZ8NpqUNcEn5mjmZubRWXXJXjtjjF5pGYW
RRbMXvs3tFhLGsqnjVHQ/AV5MyuYKfl4Tqhvrz0u2oh0A8uo5Kq3CuHBDcLhLjs1
RkDiZuSdVugRFeq6hcQAyqO6LQ/QRhqtQ1hscecr9Iv8grvs8gzi4PvlurgBFqEF
AuWe0V2WY0IJ9S7BqUUDr3Ij+0CQgxeQ70yyOztWjsT0B6ZPdOChm5Meu1+qi2ni
EuT6Q5Lo2KHTqhrvi/RbTUXs+D6LSNFN6QFOzWtKWAr+gyrQjKg=
=Ew/J
-END PGP SIGNATURE-

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Tomcat 9.0.29 - HTTPS threads age, max connections reached, Tomcat not responding on 8443

2020-08-27 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

David,

On 8/27/20 13:57, David wrote:
> On Thu, Aug 27, 2020 at 12:35 PM Christopher Schultz
>  wrote:
>>
> David,
>
> On 8/27/20 10:48, David wrote:
 In the last two weeks I've had two occurrences where a
 single CentOS 7 production server hosting a public webpage
 has become unresponsive. The first time, all 300 available
 "https-jsse-nio-8443" threads were consumed, with the max age
 being around 45minutes, and all in a "S" status. This time
 all 300 were consumed in "S" status with the oldest being
 around ~16minutes. A restart of Tomcat on both occasions
 freed these threads and the website became responsive again.
 The connections are post/get methods which shouldn't take
 very long at all.

 CPU/MEM/JVM all appear to be within normal operating limits.
 I've not had much luck searching for articles for this
 behavior nor finding remedies. The default timeout values are
 used in both Tomcat and in the applications that run within
 as far as I can tell. Hopefully someone will have some
 insight on why the behavior could be occurring, why isn't
 Tomcat killing the connections? Even in a RST/ACK status,
 shouldn't Tomcat terminate the connection without an ACK from
 the client after the default timeout?
>
> Can you please post:
>
> 1. Complete Tomcat version
>> I can't find anything more granular than 9.0.29, is there a
>> command to show a sub patch level?

9.0.29 is the patch-level, so that's fine. You are about 10 versions
out of date (~1 year). Any chance for an upgrade?

> 2. Connector configuration (possibly redacted)
>> This is the 8443 section of the server.xml *8080 is available
>> during the outage and I'm able to curl the management page to see
>> the 300 used threads, their status, and age* > name="Catalina">
>>
>> [snip]
>>
>> > connectionTimeout="2" redirectPort="8443" /> [snip]
>> > protocol="org.apache.coyote.http11.Http11NioProtocol"
>> maxThreads="300" SSLEnabled="true" > 
>> > certificateKeystoreFile="/opt/apache-tomcat-9.0.29/redacted.jks"
>> certificateKeystorePassword="redacted" type="RSA" />
>>   [snip] > protocol="org.apache.coyote.http11.Http11NioProtocol"
>> maxThreads="300" SSLEnabled="true" > > protocols="TLSv1.2"> > certificateKeystoreFile="/opt/apache-tomcat-9.0.29/redacted.jks"
>> certificateKeystorePassword="redacted" type="RSA" />
>>  

What, two connectors on one port? Do you get errors when starting?

I don't see anything obviously problematic in the above configuration
(other than the double-definition of the 8443 connector).

300 tied-up connections (from your initial report) sounds like a
significant number: probably the thread count.

Mark (as is often the case) is right: take some thread dumps next time
everything locks up and see what all those threads are doing. Often,
it's something like everything is awaiting on a db connection and the
db pool has been exhausted or something. Relatively simple quick-fixes
are available for that, and better, longer-term fixes as well.

> Do you have a single F5 or a group of them?
>> A group of them, several HA pairs depending on internal or
>> external and application.  This server is behind one HA pair and
>> is a single server.

Okay. Just remember that each F5 can make some large number of
connections to Tomcat, so you need to make sure you can handle them.

This was a much bigger deal back in the BIO days when thread limit =
connection limit, and the thread limit was usually something like 250
- - 300. NIO is much better, and the default connection limit is 10k
which "ought to be enough for anyone"[1].

- -chris

[1] With apologies to Bill gates, who apparently never said anything
of the sort.
-BEGIN PGP SIGNATURE-
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl9IHUYACgkQHPApP6U8
pFgcMhAAsN/Fc0nG4EJ/aaxZtj46g7FW2UDLa3HcGI+r8mvI5pYlCxWO0Cm4oDHn
PAEUsjNgDcyLbWPa+hIfTWZ2v594w8ACrprpdNWHoPhZ316LudpG3G8RWwrIVsOa
pn6MmX79rvds1Htl2cbsZkaaNCg/3+dx5VgyQtexHopSP9FpU1swDwex4fIf/pFz
jcl4SB6eMnKxHwf/avwEy6sfdN05ALCl6KfJBCA6vnRvMT8hYVGs5B/bDdPRU5zL
0cNIAlNaxrcw0G13cuOhg5KYG+eeKBKl2R/OiSmyn4+Xp7zzbl3G3i4GvfbYrwqe
BFTcTGT3cTE3vwMcHmsskh2soxYcH3etWtJ2/XsrKoKdRqXpWybVyNEvHcUwhxdP
h7SAN5V8D2+9a8Hhh8y/hUCHBOT70THUyBipYweV26wUj4ipOAiYiJ2UaCATwNzf
E7Giv6D4Y9WQCU119HaQ65TLmvGTtfzctM5pJzrnRbI7LOpuo9/bNYxkYNoU8U9r
sAgY4t3kvKNttetFnwdY5+JTM+yrFolYFkYMFv8vpaVyiumP4+dpbkniRAmLabWl
O0kIn/bRTkek4ic/qCuawBi1Rc1hV1/1uUE1+t8XHG7Sfdd0vwUabZ8ZRxNUBWcc
EliCfzyMosWcsgU2puNduPyXDeRxKb5gfe4VdfaH5xvfdqIpfgw=
=SesB
-END PGP SIGNATURE-

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Tomcat 9.0.29 - HTTPS threads age, max connections reached, Tomcat not responding on 8443

2020-08-27 Thread Felix Schumacher


Am 27.08.20 um 19:35 schrieb Christopher Schultz:
> David,
>
> On 8/27/20 10:48, David wrote:
> > In the last two weeks I've had two occurrences where a single
> > CentOS 7 production server hosting a public webpage has become
> > unresponsive. The first time, all 300 available
> > "https-jsse-nio-8443" threads were consumed, with the max age being
> > around 45minutes, and all in a "S" status. This time all 300 were
> > consumed in "S" status with the oldest being around ~16minutes. A
> > restart of Tomcat on both occasions freed these threads and the
> > website became responsive again. The connections are post/get
> > methods which shouldn't take very long at all.
>
> > CPU/MEM/JVM all appear to be within normal operating limits. I've
> > not had much luck searching for articles for this behavior nor
> > finding remedies. The default timeout values are used in both
> > Tomcat and in the applications that run within as far as I can
> > tell. Hopefully someone will have some insight on why the behavior
> > could be occurring, why isn't Tomcat killing the connections? Even
> > in a RST/ACK status, shouldn't Tomcat terminate the connection
> > without an ACK from the client after the default timeout?
>
> Can you please post:
>
> 1. Complete Tomcat version
> 2. Connector configuration (possibly redacted)
>
> > Is there a graceful way to script the termination of threads in
> > case Tomcat isn't able to for whatever reason?
>
> Not really.

(First look at Marks response on determining the root cause)

Well, there might be a way (if it is sane, I don't know). You can
configure a valve to look for seemingly stuck threads and try to
interrupt them:

http://tomcat.apache.org/tomcat-9.0-doc/config/valve.html#Stuck_Thread_Detection_Valve

There are a few caveats there. First it is only working, when both
conditions are true

 * the servlets are synchronous
 * the stuck thread can be "freed" with an Interrupt

But really, if your threads are stuck for more than 15 minutes, you have
ample of time to take a thread dump and hopefully find the root cause,
so that you don't need this valve.

Felix

>
> > My research for killing threads results in system threads or
> > application threads, not Tomcat Connector connection threads, so
> > I'm not sure if this is even viable. I'm also looking into ways to
> > terminate these aged sessions via the F5. At this time I'm open to
> >  any suggestions that would be able to automate a resolution to
> > keep the system from experiencing downtime, or for any insight on
> > where to look for a root cause. Thanks in advance for any guidance
> > you can lend.
> It might actually be the F5 itself, especially if it opens up a large
> number of connections to Tomcat and then tries to open additional ones
> for some reason. If it opens 300 connections (which are then e.g.
> leaked by the F5 internally) but the 301st is refused, then your
> server is essentially inert from that point forward.
>
> NIO connectors default to max 10k connections so that's not likely the
> actual problem, here, but it could be for some configurations.
>
> Do you have a single F5 or a group of them?
>
> -chris
>
> -
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org
>


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Tomcat 9.0.29 - HTTPS threads age, max connections reached, Tomcat not responding on 8443

2020-08-27 Thread Mark Thomas
On 27/08/2020 18:57, David wrote:
> On Thu, Aug 27, 2020 at 12:35 PM Christopher Schultz
>  wrote:
 Is there a graceful way to script the termination of threads in
 case Tomcat isn't able to for whatever reason?
> 
> Not really.

What you can do is take a thread dump when this happens so you can see
what the threads are doing. That should provide some insight to where
the problem is.

Mark

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Tomcat 9.0.29 - HTTPS threads age, max connections reached, Tomcat not responding on 8443

2020-08-27 Thread David
On Thu, Aug 27, 2020 at 12:35 PM Christopher Schultz
 wrote:
>
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
>
> David,
>
> On 8/27/20 10:48, David wrote:
> > In the last two weeks I've had two occurrences where a single
> > CentOS 7 production server hosting a public webpage has become
> > unresponsive. The first time, all 300 available
> > "https-jsse-nio-8443" threads were consumed, with the max age being
> > around 45minutes, and all in a "S" status. This time all 300 were
> > consumed in "S" status with the oldest being around ~16minutes. A
> > restart of Tomcat on both occasions freed these threads and the
> > website became responsive again. The connections are post/get
> > methods which shouldn't take very long at all.
> >
> > CPU/MEM/JVM all appear to be within normal operating limits. I've
> > not had much luck searching for articles for this behavior nor
> > finding remedies. The default timeout values are used in both
> > Tomcat and in the applications that run within as far as I can
> > tell. Hopefully someone will have some insight on why the behavior
> > could be occurring, why isn't Tomcat killing the connections? Even
> > in a RST/ACK status, shouldn't Tomcat terminate the connection
> > without an ACK from the client after the default timeout?
>
> Can you please post:
>
> 1. Complete Tomcat version
I can't find anything more granular than 9.0.29, is there a command to
show a sub patch level?
> 2. Connector configuration (possibly redacted)
This is the 8443 section of the server.xml *8080 is available during
the outage and I'm able to curl the management page to see the 300
used threads, their status, and age*
  






















>
> > Is there a graceful way to script the termination of threads in
> > case Tomcat isn't able to for whatever reason?
>
> Not really.
>
> > My research for killing threads results in system threads or
> > application threads, not Tomcat Connector connection threads, so
> > I'm not sure if this is even viable. I'm also looking into ways to
> > terminate these aged sessions via the F5. At this time I'm open to
> >  any suggestions that would be able to automate a resolution to
> > keep the system from experiencing downtime, or for any insight on
> > where to look for a root cause. Thanks in advance for any guidance
> > you can lend.
> It might actually be the F5 itself, especially if it opens up a large
> number of connections to Tomcat and then tries to open additional ones
> for some reason. If it opens 300 connections (which are then e.g.
> leaked by the F5 internally) but the 301st is refused, then your
> server is essentially inert from that point forward.
>
> NIO connectors default to max 10k connections so that's not likely the
> actual problem, here, but it could be for some configurations.
>
> Do you have a single F5 or a group of them?
A group of them, several HA pairs depending on internal or external
and application.  This server is behind one HA pair and is a single
server.
>
> - -chris
Thank you Chris!
David
> -BEGIN PGP SIGNATURE-
> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
>
> iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl9H7tIACgkQHPApP6U8
> pFjR1hAAldbVnHDrV0W4aPLvcDwO/zi7qvrCscNjnJWhmR1m9TrevlrSb0EZvCJo
> gTl7DXYEiZ9sBEdgs6AavHlk8jQ+ZbXbp8lsMElW5X9QoxxUD3YyJEpDSeHOG7/S
> 2CyCYGzAQER0RlzBn9w97bCKWvUWoWDeLApd/pwdATjAo53hDtdNGdz6WdNLEzKm
> g/BCZP0ynHZu7pMzSeZsOUBUXEKhDwHU+71fJo+WIJ4Gtiyb4xf2qkewvjQtuOGl
> o/yESHNCJy09JAs3xK9W6eEVp981/Fuo4qDH32MJaXXbZRaNk32AwqngXKUhTW2l
> BBl0jHoFIj+YJYc6AgVlv0la5qDIqP2VTv4ujOLBeF/95oP4sVRobIN4TiFyH6vv
> ImvvRq55ML5NvKJv8g9Tj0aY5PusfwxcwyMCVovIof49vQXJUy7SbtgRB3eqgT8W
> WwdBiGNsyWZpVjpr/CGBkBZmuR4wckeq0J5O/XGRFS9pK1jXH4gPnxe58vJmjA+P
> RiSdp3SsU0P94SuF843CW+vmWyUu6SApCybUTwo5yiFXP2e/1+M9/fUGsykXpszU
> zUvMcj9LWJ1QR3TbvEnwilsge4HKbUM3ZsFaujDjAVy6TAOgGS/dtVZ2UyMcrlOd
> JMe3GeaOdM+ej27l5D8Eq6jaQcCfy+Mg9HxsYsbyrgrw3AhBhmo=
> =eVIu
> -END PGP SIGNATURE-
>
> -
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org
>

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Tomcat 9.0.29 - HTTPS threads age, max connections reached, Tomcat not responding on 8443

2020-08-27 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

David,

On 8/27/20 10:48, David wrote:
> In the last two weeks I've had two occurrences where a single
> CentOS 7 production server hosting a public webpage has become
> unresponsive. The first time, all 300 available
> "https-jsse-nio-8443" threads were consumed, with the max age being
> around 45minutes, and all in a "S" status. This time all 300 were
> consumed in "S" status with the oldest being around ~16minutes. A
> restart of Tomcat on both occasions freed these threads and the
> website became responsive again. The connections are post/get
> methods which shouldn't take very long at all.
>
> CPU/MEM/JVM all appear to be within normal operating limits. I've
> not had much luck searching for articles for this behavior nor
> finding remedies. The default timeout values are used in both
> Tomcat and in the applications that run within as far as I can
> tell. Hopefully someone will have some insight on why the behavior
> could be occurring, why isn't Tomcat killing the connections? Even
> in a RST/ACK status, shouldn't Tomcat terminate the connection
> without an ACK from the client after the default timeout?

Can you please post:

1. Complete Tomcat version
2. Connector configuration (possibly redacted)

> Is there a graceful way to script the termination of threads in
> case Tomcat isn't able to for whatever reason?

Not really.

> My research for killing threads results in system threads or
> application threads, not Tomcat Connector connection threads, so
> I'm not sure if this is even viable. I'm also looking into ways to
> terminate these aged sessions via the F5. At this time I'm open to
>  any suggestions that would be able to automate a resolution to
> keep the system from experiencing downtime, or for any insight on
> where to look for a root cause. Thanks in advance for any guidance
> you can lend.
It might actually be the F5 itself, especially if it opens up a large
number of connections to Tomcat and then tries to open additional ones
for some reason. If it opens 300 connections (which are then e.g.
leaked by the F5 internally) but the 301st is refused, then your
server is essentially inert from that point forward.

NIO connectors default to max 10k connections so that's not likely the
actual problem, here, but it could be for some configurations.

Do you have a single F5 or a group of them?

- -chris
-BEGIN PGP SIGNATURE-
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl9H7tIACgkQHPApP6U8
pFjR1hAAldbVnHDrV0W4aPLvcDwO/zi7qvrCscNjnJWhmR1m9TrevlrSb0EZvCJo
gTl7DXYEiZ9sBEdgs6AavHlk8jQ+ZbXbp8lsMElW5X9QoxxUD3YyJEpDSeHOG7/S
2CyCYGzAQER0RlzBn9w97bCKWvUWoWDeLApd/pwdATjAo53hDtdNGdz6WdNLEzKm
g/BCZP0ynHZu7pMzSeZsOUBUXEKhDwHU+71fJo+WIJ4Gtiyb4xf2qkewvjQtuOGl
o/yESHNCJy09JAs3xK9W6eEVp981/Fuo4qDH32MJaXXbZRaNk32AwqngXKUhTW2l
BBl0jHoFIj+YJYc6AgVlv0la5qDIqP2VTv4ujOLBeF/95oP4sVRobIN4TiFyH6vv
ImvvRq55ML5NvKJv8g9Tj0aY5PusfwxcwyMCVovIof49vQXJUy7SbtgRB3eqgT8W
WwdBiGNsyWZpVjpr/CGBkBZmuR4wckeq0J5O/XGRFS9pK1jXH4gPnxe58vJmjA+P
RiSdp3SsU0P94SuF843CW+vmWyUu6SApCybUTwo5yiFXP2e/1+M9/fUGsykXpszU
zUvMcj9LWJ1QR3TbvEnwilsge4HKbUM3ZsFaujDjAVy6TAOgGS/dtVZ2UyMcrlOd
JMe3GeaOdM+ej27l5D8Eq6jaQcCfy+Mg9HxsYsbyrgrw3AhBhmo=
=eVIu
-END PGP SIGNATURE-

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Tomcat 9.0.29 - HTTPS threads age, max connections reached, Tomcat not responding on 8443

2020-08-27 Thread David
  In the last two weeks I've had two occurrences where a single CentOS 7
production server hosting a public webpage has become unresponsive. The
first time, all 300 available "https-jsse-nio-8443" threads were consumed,
with the max age being around 45minutes, and all in a "S" status. This time
all 300 were consumed in "S" status with the oldest being around
~16minutes. A restart of Tomcat on both occasions freed these threads and
the website became responsive again. The connections are post/get methods
which shouldn't take very long at all.

CPU/MEM/JVM all appear to be within normal operating limits. I've not had
much luck searching for articles for this behavior nor finding remedies.
The default timeout values are used in both Tomcat and in the applications
that run within as far as I can tell. Hopefully someone will have some
insight on why the behavior could be occurring, why isn't Tomcat killing
the connections? Even in a RST/ACK status, shouldn't Tomcat terminate the
connection without an ACK from the client after the default timeout?

Is there a graceful way to script the termination of threads in case Tomcat
isn't able to for whatever reason? My research for killing threads results
in system threads or application threads, not Tomcat Connector connection
threads, so I'm not sure if this is even viable. I'm also looking into ways
to terminate these aged sessions via the F5.At this time I'm open to any
suggestions that would be able to automate a resolution to keep the system
from experiencing downtime, or for any insight on where to look for a root
cause. Thanks in advance for any guidance you can lend.

Thanks, David