Re: Tomcat 9.0.29 - HTTPS threads age, max connections reached, Tomcat not responding on 8443
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 David, On 8/27/20 18:14, David wrote: >> I used the http to 8080 in order to read the Tomcat webmanager >> stats. I originally had issues with the JVM being too small, >> running out of memory, CPU spiking, threads maxing out, and >> whole system instability. Getting more machine memory and upping >> the JVM allocation has remedied all of that except for apparently >> the thread issue. What is the memory size of the server and of the JVM? >> I'm unsure that they were aging at that time as I couldn't get >> into anything, but with no room for GC to take place it would >> make sense that the threads would not be released. That's not usually an issue, unless the application is being a significant amount of memory during a request and then releasing it after the request has completed. >> My intention was to restart Tomcat nightly to lessen the chance >> of an occurrence until I could find a way to restart Tomcat based >> on the thread count and script a thread dump at the same time, >> (likely through Solarwinds). Now that you've explained that the >> NIO threads are a part of the system threads, I may be able to >> script something like that directly on the system, with a >> chrontab to check count, if > 295 contains NIO dump thread to / systemctl stop-start tomcat. I wouldn't do that. Just because the threads exist does not mean they are stuck. They may be doing useful work or otherwise running just fine. I would look for other ways to detect problems. >> That's very warming as it seems a viable way to get the data I >> need without posing much impact to users. Your explanation of >> threads leads me to believe that the nightly restart may be >> rather moot as it could likely be exhaustion on the downstream >> causing the backup on the front end. I didn't see these >> connected in this way and assumed they were asynchronous and >> independent processes. There are timeouts configured for all the >> DB2 backend connections, and I was in the mindset of the least >> timeout would kill all connections upstream/downstream by >> presenting the application a forcibly closed by remote host or a >> timeout. If you can suffer through a few more incidents, you can probably get a LOT more information about the root problem and maybe even get it solved, instead of just trying to stop the bleeding. >> I greatly appreciate the assistance, In looking through various >> articles none of this was really discussed because either >> everyone knows it, or maybe it was discussed on a level where I >> couldn't understand it, there certainly doesn't seem to be any >> other instances of connections being open for 18-45minutes or if >> there is it's not an issue for them. If you have a load-balancer (which you do), then I'd expect HTTP keep-alived to keep those connections open literally all day long, only maybe expiring when you have configured them to expire "just in case" or maybe after some amount of inactivity. For a lb-environment, I'd want those keep-alive timeouts to be fairly high so you don't waste any time re-constructing sockets between the lb and the app server . When an lb is NOT in the mix, you generally want /low/ keep-alive timeouts because you can't rely on clients sticking around for very long and you want to get them off your doorstep ASAP. >> During a normal glance at the manager page, there are no >> connections and maybe like 5 empty lines in a "Ready" stage, >> even if I spam the server's logon landing page I can never see a >> persistent connection, so it baffled me as to how connections >> could hang and build up, so I'm thinking something was perhaps >> messed up with the backend. If by "backend" you mean like databasse, etc. then that is probably the issue. The login page is (realtively) static, so it's very difficult to put Tomcat under such load that it's hosing just giving you that same page over and over again. I don't know what your "spamming" strategy is, but you might want to use a real load-generating tool like ApacheBench (ab) or, even better, JMeter which can actually swarm among several machines to basically DDoS your internal servers, which can be useful sometimes for stress-testing. But your tests really do have to comprise a realistic scenario, not just hammering on the login page all day. >> The webapp names /URL's for the oldest connections didn't >> coincide between the two outages, so I kind of brushed it off as >> being application specific, however it may still be. > >> I need it to occur again and get some dumps! Unfortunately, yes. - -chris -BEGIN PGP SIGNATURE- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl9JYLIACgkQHPApP6U8 pFgE6w//YnYR85ETrZJ6jvV0+jGM0qHeZeIz7tP49MZfp5fczPoYb93vrxeQ8W2T TaoiJCYpyN37w9IAZo4cxuIGaaF/j10OY2sLAqB+Ogu6FRYXmWLvzqkO+fpX6Kw+ /KKjl3cru0XKQqYpYXpfAl99G0EAFOAeT8r43guBjeF5vyqfZyaQJC/YyfEfFL9R
Re: Tomcat 9.0.29 - HTTPS threads age, max connections reached, Tomcat not responding on 8443
On Thu, Aug 27, 2020 at 4:30 PM Christopher Schultz wrote: > > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA256 > > David, > > On 8/27/20 17:14, David wrote: > > Thank you all for the replies! > > > > On Thu, Aug 27, 2020 at 3:53 PM Christopher Schultz > > wrote: > >> > > David, > > > > On 8/27/20 13:57, David wrote: > On Thu, Aug 27, 2020 at 12:35 PM Christopher Schultz > wrote: > > > David, > > On 8/27/20 10:48, David wrote: > >>> In the last two weeks I've had two occurrences where a > >>> single CentOS 7 production server hosting a public > >>> webpage has become unresponsive. The first time, all > >>> 300 available "https-jsse-nio-8443" threads were > >>> consumed, with the max age being around 45minutes, and > >>> all in a "S" status. This time all 300 were consumed in > >>> "S" status with the oldest being around ~16minutes. A > >>> restart of Tomcat on both occasions freed these threads > >>> and the website became responsive again. The > >>> connections are post/get methods which shouldn't take > >>> very long at all. > >>> > >>> CPU/MEM/JVM all appear to be within normal operating > >>> limits. I've not had much luck searching for articles > >>> for this behavior nor finding remedies. The default > >>> timeout values are used in both Tomcat and in the > >>> applications that run within as far as I can tell. > >>> Hopefully someone will have some insight on why the > >>> behavior could be occurring, why isn't Tomcat killing > >>> the connections? Even in a RST/ACK status, shouldn't > >>> Tomcat terminate the connection without an ACK from the > >>> client after the default timeout? > > Can you please post: > > 1. Complete Tomcat version > > I can't find anything more granular than 9.0.29, is there > > a command to show a sub patch level? > > > > 9.0.29 is the patch-level, so that's fine. You are about 10 > > versions out of date (~1 year). Any chance for an upgrade? > > > >> They had to re-dev many apps last year when we upgraded from I > >> want to say 1 or 3 or something equally as horrific. Hopefully > >> they are forward compatible with the newer releases and if not > >> should surely be tackled now before later, I will certainly bring > >> this to the table! > > I've rarely been bitten by an upgrade from foo.bar.x to foo.bar.y. > There is a recent caveat if you are using the AJP connector, but you > are not so it's not an issue for you. > > 2. Connector configuration (possibly redacted) > > This is the 8443 section of the server.xml *8080 is > > available during the outage and I'm able to curl the > > management page to see the 300 used threads, their status, > > and age* > > > > [snip] > > > > > connectionTimeout="2" redirectPort="8443" /> [snip] > > > protocol="org.apache.coyote.http11.Http11NioProtocol" > > maxThreads="300" SSLEnabled="true" > > > > certificateKeystoreFile="/opt/apache-tomcat-9.0.29/redacted.jks" > > > > > certificateKeystorePassword="redacted" type="RSA" /> > > [snip] > port="8443" > > protocol="org.apache.coyote.http11.Http11NioProtocol" > > maxThreads="300" SSLEnabled="true" > > protocols="TLSv1.2"> > certificateKeystoreFile="/opt/apache-tomcat-9.0.29/redacted.jks" > > > > > certificateKeystorePassword="redacted" type="RSA" /> > > > > > > What, two connectors on one port? Do you get errors when starting? > >> No errors, one is "with HTTP2" should I delete the other former? > > Well, one of them will succeed in starting the and other one should > fail. Did you copy/paste your config without modification? Weird you > don't have any errors. Usually you'll get an IOException or whatever > binding to the port twice. I do recall IOExceptions and "port already in use" errors that caused Tomcat to not start, but I think these were related to syntax errors when defining catalina variables for my JVM sizing. I'll take another look at catalina.out and ensure I don't still see these, and will likely clean up the non "with http2" connector out of the config regardless. The only edits to the section of the supplied xml were the .jks store name and pw. > > > I don't see anything obviously problematic in the above > > configuration (other than the double-definition of the 8443 > > connector). > > > > 300 tied-up connections (from your initial report) sounds like a > > significant number: probably the thread count. > >> Yes sir, that's the NIO thread count for the 8443 connector. > > > > Mark (as is often the case) is right: take some thread dumps next > > time everything locks up and see what all those threads are doing. > > Often, it's something like everything is awaiting on a db > > connection and the db pool has been exhausted or something. > > Relatively simple quick-fixes are available for that, and better, > >
Re: Tomcat 9.0.29 - HTTPS threads age, max connections reached, Tomcat not responding on 8443
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 David, On 8/27/20 17:14, David wrote: > Thank you all for the replies! > > On Thu, Aug 27, 2020 at 3:53 PM Christopher Schultz > wrote: >> > David, > > On 8/27/20 13:57, David wrote: On Thu, Aug 27, 2020 at 12:35 PM Christopher Schultz wrote: > David, On 8/27/20 10:48, David wrote: >>> In the last two weeks I've had two occurrences where a >>> single CentOS 7 production server hosting a public >>> webpage has become unresponsive. The first time, all >>> 300 available "https-jsse-nio-8443" threads were >>> consumed, with the max age being around 45minutes, and >>> all in a "S" status. This time all 300 were consumed in >>> "S" status with the oldest being around ~16minutes. A >>> restart of Tomcat on both occasions freed these threads >>> and the website became responsive again. The >>> connections are post/get methods which shouldn't take >>> very long at all. >>> >>> CPU/MEM/JVM all appear to be within normal operating >>> limits. I've not had much luck searching for articles >>> for this behavior nor finding remedies. The default >>> timeout values are used in both Tomcat and in the >>> applications that run within as far as I can tell. >>> Hopefully someone will have some insight on why the >>> behavior could be occurring, why isn't Tomcat killing >>> the connections? Even in a RST/ACK status, shouldn't >>> Tomcat terminate the connection without an ACK from the >>> client after the default timeout? Can you please post: 1. Complete Tomcat version > I can't find anything more granular than 9.0.29, is there > a command to show a sub patch level? > > 9.0.29 is the patch-level, so that's fine. You are about 10 > versions out of date (~1 year). Any chance for an upgrade? > >> They had to re-dev many apps last year when we upgraded from I >> want to say 1 or 3 or something equally as horrific. Hopefully >> they are forward compatible with the newer releases and if not >> should surely be tackled now before later, I will certainly bring >> this to the table! I've rarely been bitten by an upgrade from foo.bar.x to foo.bar.y. There is a recent caveat if you are using the AJP connector, but you are not so it's not an issue for you. 2. Connector configuration (possibly redacted) > This is the 8443 section of the server.xml *8080 is > available during the outage and I'm able to curl the > management page to see the 300 used threads, their status, > and age* > > [snip] > > connectionTimeout="2" redirectPort="8443" /> [snip] > protocol="org.apache.coyote.http11.Http11NioProtocol" > maxThreads="300" SSLEnabled="true" > > certificateKeystoreFile="/opt/apache-tomcat-9.0.29/redacted.jks" > > certificateKeystorePassword="redacted" type="RSA" /> > [snip] port="8443" > protocol="org.apache.coyote.http11.Http11NioProtocol" > maxThreads="300" SSLEnabled="true" > protocols="TLSv1.2"> certificateKeystoreFile="/opt/apache-tomcat-9.0.29/redacted.jks" > > certificateKeystorePassword="redacted" type="RSA" /> > > > What, two connectors on one port? Do you get errors when starting? >> No errors, one is "with HTTP2" should I delete the other former? Well, one of them will succeed in starting the and other one should fail. Did you copy/paste your config without modification? Weird you don't have any errors. Usually you'll get an IOException or whatever binding to the port twice. > I don't see anything obviously problematic in the above > configuration (other than the double-definition of the 8443 > connector). > > 300 tied-up connections (from your initial report) sounds like a > significant number: probably the thread count. >> Yes sir, that's the NIO thread count for the 8443 connector. > > Mark (as is often the case) is right: take some thread dumps next > time everything locks up and see what all those threads are doing. > Often, it's something like everything is awaiting on a db > connection and the db pool has been exhausted or something. > Relatively simple quick-fixes are available for that, and better, > longer-term fixes as well. > >> Mark/Chris is there a way to dump the connector threads >> specifically? Or simply is it all contained as a machine/process >> thread? Sorry I'm not really a Linux guy. Most of the threads in the server will be connector threads. They will have names like https-nio-[port]-exec-[number]. If you get a thread dump[1], you'll get a stack trace from every thread. Rainer wrote a great presentation about them in the context of Tomcat. Feel free to give it a read: http://home.apache.org/~rjung/presentations/2018-06-13-ApacheRoadShow-Ja vaThreadDumps.pdf Do you have a single F5 or a group of them? > A group of them, several HA pairs depending on internal or > external
Re: Tomcat 9.0.29 - HTTPS threads age, max connections reached, Tomcat not responding on 8443
Thank you all for the replies! On Thu, Aug 27, 2020 at 3:53 PM Christopher Schultz wrote: > > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA256 > > David, > > On 8/27/20 13:57, David wrote: > > On Thu, Aug 27, 2020 at 12:35 PM Christopher Schultz > > wrote: > >> > > David, > > > > On 8/27/20 10:48, David wrote: > In the last two weeks I've had two occurrences where a > single CentOS 7 production server hosting a public webpage > has become unresponsive. The first time, all 300 available > "https-jsse-nio-8443" threads were consumed, with the max age > being around 45minutes, and all in a "S" status. This time > all 300 were consumed in "S" status with the oldest being > around ~16minutes. A restart of Tomcat on both occasions > freed these threads and the website became responsive again. > The connections are post/get methods which shouldn't take > very long at all. > > CPU/MEM/JVM all appear to be within normal operating limits. > I've not had much luck searching for articles for this > behavior nor finding remedies. The default timeout values are > used in both Tomcat and in the applications that run within > as far as I can tell. Hopefully someone will have some > insight on why the behavior could be occurring, why isn't > Tomcat killing the connections? Even in a RST/ACK status, > shouldn't Tomcat terminate the connection without an ACK from > the client after the default timeout? > > > > Can you please post: > > > > 1. Complete Tomcat version > >> I can't find anything more granular than 9.0.29, is there a > >> command to show a sub patch level? > > 9.0.29 is the patch-level, so that's fine. You are about 10 versions > out of date (~1 year). Any chance for an upgrade? They had to re-dev many apps last year when we upgraded from I want to say 1 or 3 or something equally as horrific. Hopefully they are forward compatible with the newer releases and if not should surely be tackled now before later, I will certainly bring this to the table! > > > 2. Connector configuration (possibly redacted) > >> This is the 8443 section of the server.xml *8080 is available > >> during the outage and I'm able to curl the management page to see > >> the 300 used threads, their status, and age* >> name="Catalina"> > >> > >> [snip] > >> > >> >> connectionTimeout="2" redirectPort="8443" /> [snip] > >> >> protocol="org.apache.coyote.http11.Http11NioProtocol" > >> maxThreads="300" SSLEnabled="true" > > >> >> certificateKeystoreFile="/opt/apache-tomcat-9.0.29/redacted.jks" > >> certificateKeystorePassword="redacted" type="RSA" /> > >> [snip] >> protocol="org.apache.coyote.http11.Http11NioProtocol" > >> maxThreads="300" SSLEnabled="true" > >> protocols="TLSv1.2"> >> certificateKeystoreFile="/opt/apache-tomcat-9.0.29/redacted.jks" > >> certificateKeystorePassword="redacted" type="RSA" /> > >> > > What, two connectors on one port? Do you get errors when starting? No errors, one is "with HTTP2" should I delete the other former? > > I don't see anything obviously problematic in the above configuration > (other than the double-definition of the 8443 connector). > > 300 tied-up connections (from your initial report) sounds like a > significant number: probably the thread count. Yes sir, that's the NIO thread count for the 8443 connector. > > Mark (as is often the case) is right: take some thread dumps next time > everything locks up and see what all those threads are doing. Often, > it's something like everything is awaiting on a db connection and the > db pool has been exhausted or something. Relatively simple quick-fixes > are available for that, and better, longer-term fixes as well. > Mark/Chris is there a way to dump the connector threads specifically? Or simply is it all contained as a machine/process thread? Sorry I'm not really a Linux guy. > > Do you have a single F5 or a group of them? > >> A group of them, several HA pairs depending on internal or > >> external and application. This server is behind one HA pair and > >> is a single server. > > Okay. Just remember that each F5 can make some large number of > connections to Tomcat, so you need to make sure you can handle them. > > This was a much bigger deal back in the BIO days when thread limit = > connection limit, and the thread limit was usually something like 250 > - - 300. NIO is much better, and the default connection limit is 10k > which "ought to be enough for anyone"[1]. (lol) I'm more used to the 1-1 of the BIO style, which kinda confused me when I asked the F5 to truncate >X connections and alert me and there were 600+ connections while Tomcat manager stated ~30. Then I read what the non-interrupt was about. > > > > [1] With apologies to Bill gates, who apparently never said anything > of the sort. Thanks again, David > -BEGIN PGP SIGNATURE- > Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ > >
Re: Tomcat 9.0.29 - HTTPS threads age, max connections reached, Tomcat not responding on 8443
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Felix, On 8/27/20 16:09, Felix Schumacher wrote: > > Am 27.08.20 um 19:35 schrieb Christopher Schultz: >> David, >> >> On 8/27/20 10:48, David wrote: >>> In the last two weeks I've had two occurrences where a single >>> CentOS 7 production server hosting a public webpage has become >>> unresponsive. The first time, all 300 available >>> "https-jsse-nio-8443" threads were consumed, with the max age >>> being around 45minutes, and all in a "S" status. This time all >>> 300 were consumed in "S" status with the oldest being around >>> ~16minutes. A restart of Tomcat on both occasions freed these >>> threads and the website became responsive again. The >>> connections are post/get methods which shouldn't take very long >>> at all. >> >>> CPU/MEM/JVM all appear to be within normal operating limits. >>> I've not had much luck searching for articles for this behavior >>> nor finding remedies. The default timeout values are used in >>> both Tomcat and in the applications that run within as far as I >>> can tell. Hopefully someone will have some insight on why the >>> behavior could be occurring, why isn't Tomcat killing the >>> connections? Even in a RST/ACK status, shouldn't Tomcat >>> terminate the connection without an ACK from the client after >>> the default timeout? >> >> Can you please post: >> >> 1. Complete Tomcat version 2. Connector configuration (possibly >> redacted) >> >>> Is there a graceful way to script the termination of threads >>> in case Tomcat isn't able to for whatever reason? >> >> Not really. > > (First look at Marks response on determining the root cause) > > Well, there might be a way (if it is sane, I don't know). You can > configure a valve to look for seemingly stuck threads and try to > interrupt them: > > http://tomcat.apache.org/tomcat-9.0-doc/config/valve.html#Stuck_Thread _Detection_Valve > > There are a few caveats there. First it is only working, when > both conditions are true > > * the servlets are synchronous * the stuck thread can be "freed" > with an Interrupt > > But really, if your threads are stuck for more than 15 minutes, you > have ample of time to take a thread dump and hopefully find the > root cause, so that you don't need this valve. This is a good idea as a band-aid, but the reality is that if you need the StuckThreadDetectionValve then your application is probably broken and needs to be fixed. Here are things that can be broken which might cause thread exhaustion: 1. Poor resource management. Things like db connections pools which can leak and/or not be refilled by the application. Everything stops when the db pool dries up. 2. Failure to set proper IO timeouts. Guess what the default read timeout is on a socket? Forever! If you read from a socket you might never hear back. Sounds like a problem. Set your read timeouts, kids. You might need to do this on your HTTP connections (and pools, and factories, and connection-wrappers like Apache http-client), your database config (usually in the config URL), and any remote-API libraries you are using (which use e.g. HTTP under the hood). - -chris -BEGIN PGP SIGNATURE- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl9IICoACgkQHPApP6U8 pFgkuQ/+NE7tC+wWXoP2Ntv0yljJHyasRY/3dVewoNUxfO4CwcEkhSpK5YEkiHd3 sE7jygxEn3SHtHJ0WQPBWMAzL9RoLnglbAsxVXuWCzbQzd3tGCKOxQevCN3y+2ft jffqMEqOCgrN4kvKivj75V3alFQotT+jbZm1nJEwuQCLSJCqiHWcyCLlJF9Y6axn Thvsv40bnTKCPgqezo/0AYiYjQ9xIatTC3QDw129E7bofNKPBLk7LWcbg9CQBu+T iboA8IIxFgrOFYn66Mgx4kcJcQTRJ2XgdJ1v8p+mSITWH3UkLa5OhZeTqU6x2LDl LPuY8eC6y9QUqpFeEtaL72ZpDdYAn7Vcu4B3+D4Oobh7o2EJNQijIQ6A2QKIFw6e eBACKL0JJMwvfxVnp3nKIuoB3yOemMGZ8NpqUNcEn5mjmZubRWXXJXjtjjF5pGYW RRbMXvs3tFhLGsqnjVHQ/AV5MyuYKfl4Tqhvrz0u2oh0A8uo5Kq3CuHBDcLhLjs1 RkDiZuSdVugRFeq6hcQAyqO6LQ/QRhqtQ1hscecr9Iv8grvs8gzi4PvlurgBFqEF AuWe0V2WY0IJ9S7BqUUDr3Ij+0CQgxeQ70yyOztWjsT0B6ZPdOChm5Meu1+qi2ni EuT6Q5Lo2KHTqhrvi/RbTUXs+D6LSNFN6QFOzWtKWAr+gyrQjKg= =Ew/J -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Tomcat 9.0.29 - HTTPS threads age, max connections reached, Tomcat not responding on 8443
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 David, On 8/27/20 13:57, David wrote: > On Thu, Aug 27, 2020 at 12:35 PM Christopher Schultz > wrote: >> > David, > > On 8/27/20 10:48, David wrote: In the last two weeks I've had two occurrences where a single CentOS 7 production server hosting a public webpage has become unresponsive. The first time, all 300 available "https-jsse-nio-8443" threads were consumed, with the max age being around 45minutes, and all in a "S" status. This time all 300 were consumed in "S" status with the oldest being around ~16minutes. A restart of Tomcat on both occasions freed these threads and the website became responsive again. The connections are post/get methods which shouldn't take very long at all. CPU/MEM/JVM all appear to be within normal operating limits. I've not had much luck searching for articles for this behavior nor finding remedies. The default timeout values are used in both Tomcat and in the applications that run within as far as I can tell. Hopefully someone will have some insight on why the behavior could be occurring, why isn't Tomcat killing the connections? Even in a RST/ACK status, shouldn't Tomcat terminate the connection without an ACK from the client after the default timeout? > > Can you please post: > > 1. Complete Tomcat version >> I can't find anything more granular than 9.0.29, is there a >> command to show a sub patch level? 9.0.29 is the patch-level, so that's fine. You are about 10 versions out of date (~1 year). Any chance for an upgrade? > 2. Connector configuration (possibly redacted) >> This is the 8443 section of the server.xml *8080 is available >> during the outage and I'm able to curl the management page to see >> the 300 used threads, their status, and age* > name="Catalina"> >> >> [snip] >> >> > connectionTimeout="2" redirectPort="8443" /> [snip] >> > protocol="org.apache.coyote.http11.Http11NioProtocol" >> maxThreads="300" SSLEnabled="true" > >> > certificateKeystoreFile="/opt/apache-tomcat-9.0.29/redacted.jks" >> certificateKeystorePassword="redacted" type="RSA" /> >> [snip] > protocol="org.apache.coyote.http11.Http11NioProtocol" >> maxThreads="300" SSLEnabled="true" > > protocols="TLSv1.2"> > certificateKeystoreFile="/opt/apache-tomcat-9.0.29/redacted.jks" >> certificateKeystorePassword="redacted" type="RSA" /> >> What, two connectors on one port? Do you get errors when starting? I don't see anything obviously problematic in the above configuration (other than the double-definition of the 8443 connector). 300 tied-up connections (from your initial report) sounds like a significant number: probably the thread count. Mark (as is often the case) is right: take some thread dumps next time everything locks up and see what all those threads are doing. Often, it's something like everything is awaiting on a db connection and the db pool has been exhausted or something. Relatively simple quick-fixes are available for that, and better, longer-term fixes as well. > Do you have a single F5 or a group of them? >> A group of them, several HA pairs depending on internal or >> external and application. This server is behind one HA pair and >> is a single server. Okay. Just remember that each F5 can make some large number of connections to Tomcat, so you need to make sure you can handle them. This was a much bigger deal back in the BIO days when thread limit = connection limit, and the thread limit was usually something like 250 - - 300. NIO is much better, and the default connection limit is 10k which "ought to be enough for anyone"[1]. - -chris [1] With apologies to Bill gates, who apparently never said anything of the sort. -BEGIN PGP SIGNATURE- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl9IHUYACgkQHPApP6U8 pFgcMhAAsN/Fc0nG4EJ/aaxZtj46g7FW2UDLa3HcGI+r8mvI5pYlCxWO0Cm4oDHn PAEUsjNgDcyLbWPa+hIfTWZ2v594w8ACrprpdNWHoPhZ316LudpG3G8RWwrIVsOa pn6MmX79rvds1Htl2cbsZkaaNCg/3+dx5VgyQtexHopSP9FpU1swDwex4fIf/pFz jcl4SB6eMnKxHwf/avwEy6sfdN05ALCl6KfJBCA6vnRvMT8hYVGs5B/bDdPRU5zL 0cNIAlNaxrcw0G13cuOhg5KYG+eeKBKl2R/OiSmyn4+Xp7zzbl3G3i4GvfbYrwqe BFTcTGT3cTE3vwMcHmsskh2soxYcH3etWtJ2/XsrKoKdRqXpWybVyNEvHcUwhxdP h7SAN5V8D2+9a8Hhh8y/hUCHBOT70THUyBipYweV26wUj4ipOAiYiJ2UaCATwNzf E7Giv6D4Y9WQCU119HaQ65TLmvGTtfzctM5pJzrnRbI7LOpuo9/bNYxkYNoU8U9r sAgY4t3kvKNttetFnwdY5+JTM+yrFolYFkYMFv8vpaVyiumP4+dpbkniRAmLabWl O0kIn/bRTkek4ic/qCuawBi1Rc1hV1/1uUE1+t8XHG7Sfdd0vwUabZ8ZRxNUBWcc EliCfzyMosWcsgU2puNduPyXDeRxKb5gfe4VdfaH5xvfdqIpfgw= =SesB -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Tomcat 9.0.29 - HTTPS threads age, max connections reached, Tomcat not responding on 8443
Am 27.08.20 um 19:35 schrieb Christopher Schultz: > David, > > On 8/27/20 10:48, David wrote: > > In the last two weeks I've had two occurrences where a single > > CentOS 7 production server hosting a public webpage has become > > unresponsive. The first time, all 300 available > > "https-jsse-nio-8443" threads were consumed, with the max age being > > around 45minutes, and all in a "S" status. This time all 300 were > > consumed in "S" status with the oldest being around ~16minutes. A > > restart of Tomcat on both occasions freed these threads and the > > website became responsive again. The connections are post/get > > methods which shouldn't take very long at all. > > > CPU/MEM/JVM all appear to be within normal operating limits. I've > > not had much luck searching for articles for this behavior nor > > finding remedies. The default timeout values are used in both > > Tomcat and in the applications that run within as far as I can > > tell. Hopefully someone will have some insight on why the behavior > > could be occurring, why isn't Tomcat killing the connections? Even > > in a RST/ACK status, shouldn't Tomcat terminate the connection > > without an ACK from the client after the default timeout? > > Can you please post: > > 1. Complete Tomcat version > 2. Connector configuration (possibly redacted) > > > Is there a graceful way to script the termination of threads in > > case Tomcat isn't able to for whatever reason? > > Not really. (First look at Marks response on determining the root cause) Well, there might be a way (if it is sane, I don't know). You can configure a valve to look for seemingly stuck threads and try to interrupt them: http://tomcat.apache.org/tomcat-9.0-doc/config/valve.html#Stuck_Thread_Detection_Valve There are a few caveats there. First it is only working, when both conditions are true * the servlets are synchronous * the stuck thread can be "freed" with an Interrupt But really, if your threads are stuck for more than 15 minutes, you have ample of time to take a thread dump and hopefully find the root cause, so that you don't need this valve. Felix > > > My research for killing threads results in system threads or > > application threads, not Tomcat Connector connection threads, so > > I'm not sure if this is even viable. I'm also looking into ways to > > terminate these aged sessions via the F5. At this time I'm open to > > any suggestions that would be able to automate a resolution to > > keep the system from experiencing downtime, or for any insight on > > where to look for a root cause. Thanks in advance for any guidance > > you can lend. > It might actually be the F5 itself, especially if it opens up a large > number of connections to Tomcat and then tries to open additional ones > for some reason. If it opens 300 connections (which are then e.g. > leaked by the F5 internally) but the 301st is refused, then your > server is essentially inert from that point forward. > > NIO connectors default to max 10k connections so that's not likely the > actual problem, here, but it could be for some configurations. > > Do you have a single F5 or a group of them? > > -chris > > - > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > For additional commands, e-mail: users-h...@tomcat.apache.org > - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Tomcat 9.0.29 - HTTPS threads age, max connections reached, Tomcat not responding on 8443
On 27/08/2020 18:57, David wrote: > On Thu, Aug 27, 2020 at 12:35 PM Christopher Schultz > wrote: Is there a graceful way to script the termination of threads in case Tomcat isn't able to for whatever reason? > > Not really. What you can do is take a thread dump when this happens so you can see what the threads are doing. That should provide some insight to where the problem is. Mark - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Tomcat 9.0.29 - HTTPS threads age, max connections reached, Tomcat not responding on 8443
On Thu, Aug 27, 2020 at 12:35 PM Christopher Schultz wrote: > > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA256 > > David, > > On 8/27/20 10:48, David wrote: > > In the last two weeks I've had two occurrences where a single > > CentOS 7 production server hosting a public webpage has become > > unresponsive. The first time, all 300 available > > "https-jsse-nio-8443" threads were consumed, with the max age being > > around 45minutes, and all in a "S" status. This time all 300 were > > consumed in "S" status with the oldest being around ~16minutes. A > > restart of Tomcat on both occasions freed these threads and the > > website became responsive again. The connections are post/get > > methods which shouldn't take very long at all. > > > > CPU/MEM/JVM all appear to be within normal operating limits. I've > > not had much luck searching for articles for this behavior nor > > finding remedies. The default timeout values are used in both > > Tomcat and in the applications that run within as far as I can > > tell. Hopefully someone will have some insight on why the behavior > > could be occurring, why isn't Tomcat killing the connections? Even > > in a RST/ACK status, shouldn't Tomcat terminate the connection > > without an ACK from the client after the default timeout? > > Can you please post: > > 1. Complete Tomcat version I can't find anything more granular than 9.0.29, is there a command to show a sub patch level? > 2. Connector configuration (possibly redacted) This is the 8443 section of the server.xml *8080 is available during the outage and I'm able to curl the management page to see the 300 used threads, their status, and age* > > > Is there a graceful way to script the termination of threads in > > case Tomcat isn't able to for whatever reason? > > Not really. > > > My research for killing threads results in system threads or > > application threads, not Tomcat Connector connection threads, so > > I'm not sure if this is even viable. I'm also looking into ways to > > terminate these aged sessions via the F5. At this time I'm open to > > any suggestions that would be able to automate a resolution to > > keep the system from experiencing downtime, or for any insight on > > where to look for a root cause. Thanks in advance for any guidance > > you can lend. > It might actually be the F5 itself, especially if it opens up a large > number of connections to Tomcat and then tries to open additional ones > for some reason. If it opens 300 connections (which are then e.g. > leaked by the F5 internally) but the 301st is refused, then your > server is essentially inert from that point forward. > > NIO connectors default to max 10k connections so that's not likely the > actual problem, here, but it could be for some configurations. > > Do you have a single F5 or a group of them? A group of them, several HA pairs depending on internal or external and application. This server is behind one HA pair and is a single server. > > - -chris Thank you Chris! David > -BEGIN PGP SIGNATURE- > Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ > > iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl9H7tIACgkQHPApP6U8 > pFjR1hAAldbVnHDrV0W4aPLvcDwO/zi7qvrCscNjnJWhmR1m9TrevlrSb0EZvCJo > gTl7DXYEiZ9sBEdgs6AavHlk8jQ+ZbXbp8lsMElW5X9QoxxUD3YyJEpDSeHOG7/S > 2CyCYGzAQER0RlzBn9w97bCKWvUWoWDeLApd/pwdATjAo53hDtdNGdz6WdNLEzKm > g/BCZP0ynHZu7pMzSeZsOUBUXEKhDwHU+71fJo+WIJ4Gtiyb4xf2qkewvjQtuOGl > o/yESHNCJy09JAs3xK9W6eEVp981/Fuo4qDH32MJaXXbZRaNk32AwqngXKUhTW2l > BBl0jHoFIj+YJYc6AgVlv0la5qDIqP2VTv4ujOLBeF/95oP4sVRobIN4TiFyH6vv > ImvvRq55ML5NvKJv8g9Tj0aY5PusfwxcwyMCVovIof49vQXJUy7SbtgRB3eqgT8W > WwdBiGNsyWZpVjpr/CGBkBZmuR4wckeq0J5O/XGRFS9pK1jXH4gPnxe58vJmjA+P > RiSdp3SsU0P94SuF843CW+vmWyUu6SApCybUTwo5yiFXP2e/1+M9/fUGsykXpszU > zUvMcj9LWJ1QR3TbvEnwilsge4HKbUM3ZsFaujDjAVy6TAOgGS/dtVZ2UyMcrlOd > JMe3GeaOdM+ej27l5D8Eq6jaQcCfy+Mg9HxsYsbyrgrw3AhBhmo= > =eVIu > -END PGP SIGNATURE- > > - > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > For additional commands, e-mail: users-h...@tomcat.apache.org > - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Tomcat 9.0.29 - HTTPS threads age, max connections reached, Tomcat not responding on 8443
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 David, On 8/27/20 10:48, David wrote: > In the last two weeks I've had two occurrences where a single > CentOS 7 production server hosting a public webpage has become > unresponsive. The first time, all 300 available > "https-jsse-nio-8443" threads were consumed, with the max age being > around 45minutes, and all in a "S" status. This time all 300 were > consumed in "S" status with the oldest being around ~16minutes. A > restart of Tomcat on both occasions freed these threads and the > website became responsive again. The connections are post/get > methods which shouldn't take very long at all. > > CPU/MEM/JVM all appear to be within normal operating limits. I've > not had much luck searching for articles for this behavior nor > finding remedies. The default timeout values are used in both > Tomcat and in the applications that run within as far as I can > tell. Hopefully someone will have some insight on why the behavior > could be occurring, why isn't Tomcat killing the connections? Even > in a RST/ACK status, shouldn't Tomcat terminate the connection > without an ACK from the client after the default timeout? Can you please post: 1. Complete Tomcat version 2. Connector configuration (possibly redacted) > Is there a graceful way to script the termination of threads in > case Tomcat isn't able to for whatever reason? Not really. > My research for killing threads results in system threads or > application threads, not Tomcat Connector connection threads, so > I'm not sure if this is even viable. I'm also looking into ways to > terminate these aged sessions via the F5. At this time I'm open to > any suggestions that would be able to automate a resolution to > keep the system from experiencing downtime, or for any insight on > where to look for a root cause. Thanks in advance for any guidance > you can lend. It might actually be the F5 itself, especially if it opens up a large number of connections to Tomcat and then tries to open additional ones for some reason. If it opens 300 connections (which are then e.g. leaked by the F5 internally) but the 301st is refused, then your server is essentially inert from that point forward. NIO connectors default to max 10k connections so that's not likely the actual problem, here, but it could be for some configurations. Do you have a single F5 or a group of them? - -chris -BEGIN PGP SIGNATURE- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl9H7tIACgkQHPApP6U8 pFjR1hAAldbVnHDrV0W4aPLvcDwO/zi7qvrCscNjnJWhmR1m9TrevlrSb0EZvCJo gTl7DXYEiZ9sBEdgs6AavHlk8jQ+ZbXbp8lsMElW5X9QoxxUD3YyJEpDSeHOG7/S 2CyCYGzAQER0RlzBn9w97bCKWvUWoWDeLApd/pwdATjAo53hDtdNGdz6WdNLEzKm g/BCZP0ynHZu7pMzSeZsOUBUXEKhDwHU+71fJo+WIJ4Gtiyb4xf2qkewvjQtuOGl o/yESHNCJy09JAs3xK9W6eEVp981/Fuo4qDH32MJaXXbZRaNk32AwqngXKUhTW2l BBl0jHoFIj+YJYc6AgVlv0la5qDIqP2VTv4ujOLBeF/95oP4sVRobIN4TiFyH6vv ImvvRq55ML5NvKJv8g9Tj0aY5PusfwxcwyMCVovIof49vQXJUy7SbtgRB3eqgT8W WwdBiGNsyWZpVjpr/CGBkBZmuR4wckeq0J5O/XGRFS9pK1jXH4gPnxe58vJmjA+P RiSdp3SsU0P94SuF843CW+vmWyUu6SApCybUTwo5yiFXP2e/1+M9/fUGsykXpszU zUvMcj9LWJ1QR3TbvEnwilsge4HKbUM3ZsFaujDjAVy6TAOgGS/dtVZ2UyMcrlOd JMe3GeaOdM+ej27l5D8Eq6jaQcCfy+Mg9HxsYsbyrgrw3AhBhmo= =eVIu -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Tomcat 9.0.29 - HTTPS threads age, max connections reached, Tomcat not responding on 8443
In the last two weeks I've had two occurrences where a single CentOS 7 production server hosting a public webpage has become unresponsive. The first time, all 300 available "https-jsse-nio-8443" threads were consumed, with the max age being around 45minutes, and all in a "S" status. This time all 300 were consumed in "S" status with the oldest being around ~16minutes. A restart of Tomcat on both occasions freed these threads and the website became responsive again. The connections are post/get methods which shouldn't take very long at all. CPU/MEM/JVM all appear to be within normal operating limits. I've not had much luck searching for articles for this behavior nor finding remedies. The default timeout values are used in both Tomcat and in the applications that run within as far as I can tell. Hopefully someone will have some insight on why the behavior could be occurring, why isn't Tomcat killing the connections? Even in a RST/ACK status, shouldn't Tomcat terminate the connection without an ACK from the client after the default timeout? Is there a graceful way to script the termination of threads in case Tomcat isn't able to for whatever reason? My research for killing threads results in system threads or application threads, not Tomcat Connector connection threads, so I'm not sure if this is even viable. I'm also looking into ways to terminate these aged sessions via the F5.At this time I'm open to any suggestions that would be able to automate a resolution to keep the system from experiencing downtime, or for any insight on where to look for a root cause. Thanks in advance for any guidance you can lend. Thanks, David