Re: AJP connection pool issue bug?
I missed some of these messages before.. I apologize. Can I send these to you privately. On Wed, Oct 4, 2017 at 4:01 PM, Christopher Schultz < ch...@christopherschultz.net> wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA256 > > TCD, > > On 10/4/17 3:45 PM, TurboChargedDad . wrote: > > Perhaps I am not wording my question correctly. > > Can you confirm that the connection-pool exhaustion appears to be > happening on the AJP client (httpd/mod_proxy_ajp) and NOT on the > server (Tomcat/AJP)? > > If so, the problem will likely not improve by switching-over to an > NIO-based connector on the Tomcat side. > > Having said that, the real problem is likely to be simple arithmetic. > Remember this expression: > > Ctc = Nhttpd * Cworkers > > Ctc = Connections Tomcat should be prepared to accept (e.g. Connector > maxConnections) > > Nhttpd = # of httpd servers > Cworkers = total # of connections in httpd connection pool for all > workers(!!) > > Imagine the following scenario: > > Nhttpd = 2 > Cworker = 200 > Ntomcat = 2 > > On httpd server A, we have a connection pool with 200 connections. If > Tomcat A goes down, all 200 connections will go to Tomcat B. If that > happens to both proxies (Tomcat A stops responding), then both proxies > will send all 200 connections to Tomcat B. That means that Tomcat B > needs to be able to support 400 connections, not 200. > > Let's say you now have 5 workers (1 for each application). Each worker > gets its own connection pool, and each connection pool has 200 workers > in it. Now, we have a situation where each httpd instance actually has > 1000 (potential) connections in the connection pool, and if Tomcat A > goes down, Tomcat B must be able to handle 2000 connections (1000 from > httpd A and 1000 from httpd B). > > At some point, you can't provision enough threads to handle all of > those connections. > > The solution (bringing this back around again) is to use NIO, because > you can handle a LOT more connections with a lower number of threads. > NIO doesn't allow you to handle more *concurrent* traffic (in fact, it > makes performance a tiny bit worse than BIO), but it will allow you to > have TONS of idle connections that aren't "wasting" request-processing > threads that are just waiting for another actual request to come > across the wire. > > > As a test I changed the following line in one of the many tomcat > > instances running on the server and bounced it. Old New > protocol="org.apache.coyote.ajp.AjpNioProtocol" redirectPort="8443" > > maxThreads="300" /> > > Yep, that's how to do it. > > > As the docs state I am able to verify that it did in fact switch > > over to NIO. > > > > INFO: Starting ProtocolHandler ["ajp-nio-9335"] > > Good. Now you can handle many idle connections with the same number of > threads. > > > Will running NIO and BIO on the same box have a negative impact? > > No. > > > I am thinking they should all be switched to NIO, this was just a > > test to see if I was understanding what I was reading. > I would recommend NIO in all cases. > > > That being said I suspect there are going to be far more tweaks > > that needs to be applied as there are none to date. > > Hopefully not. A recent Tomcat (which you don't actually have) with a > stock configuration should be fairly well-configured to handle a great > deal of traffic without falling-over. > > > I also know that the HTTPD server is running in prefork mode. > That will pose some other issues for you, mostly the ability to handle > bursts of high concurrency from your clients. You can consider it > out-of-scope for this discussion, though. What we want to do for you > is stop httpd+Tomcat from freaking out and getting stopped-up with > even a small number of users. > > > Which I think leaves me with no control over how many connections > > can be handed back from apache on a site by site basis. > > Probably not on a site-by-site basis, but you can adjust the > connection-pool size on a per-worker basis. For prefork it MUST BE > connection_pool_size=1 (the default for prefork httpd) and for > "worker" and similarly-threaded MPMs the default should be fine to use. > > > Really having hard time explaining to others how BIO could have > > caused the connection pool for another use to become exhausted. > Well... > > If one of your Tomcats locks-up (database is dead; might want to check > to see how the application is accessing that... infinite timeouts can > be a real killer, here), it can tie-up connections from > mod_proxy_ajp's connection pool. But those connections should be > per-worker and shouldn't interfere with each other. Unless you have an > uber-worker that handles everything for all those various Tomcats. > > Can you give us a peek at your worker configuration? You explained it > a bit in your first post, but it might be time for some more details... > > - -chris > -BEGIN PGP SIGNATURE- > Comment: GPGTools - http://gpgtools.org > Comment: Using GnuPG with Thunderbird -
Re: AJP connection pool issue bug?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 TCD, On 10/4/17 3:45 PM, TurboChargedDad . wrote: > Perhaps I am not wording my question correctly. Can you confirm that the connection-pool exhaustion appears to be happening on the AJP client (httpd/mod_proxy_ajp) and NOT on the server (Tomcat/AJP)? If so, the problem will likely not improve by switching-over to an NIO-based connector on the Tomcat side. Having said that, the real problem is likely to be simple arithmetic. Remember this expression: Ctc = Nhttpd * Cworkers Ctc = Connections Tomcat should be prepared to accept (e.g. Connector maxConnections) Nhttpd = # of httpd servers Cworkers = total # of connections in httpd connection pool for all workers(!!) Imagine the following scenario: Nhttpd = 2 Cworker = 200 Ntomcat = 2 On httpd server A, we have a connection pool with 200 connections. If Tomcat A goes down, all 200 connections will go to Tomcat B. If that happens to both proxies (Tomcat A stops responding), then both proxies will send all 200 connections to Tomcat B. That means that Tomcat B needs to be able to support 400 connections, not 200. Let's say you now have 5 workers (1 for each application). Each worker gets its own connection pool, and each connection pool has 200 workers in it. Now, we have a situation where each httpd instance actually has 1000 (potential) connections in the connection pool, and if Tomcat A goes down, Tomcat B must be able to handle 2000 connections (1000 from httpd A and 1000 from httpd B). At some point, you can't provision enough threads to handle all of those connections. The solution (bringing this back around again) is to use NIO, because you can handle a LOT more connections with a lower number of threads. NIO doesn't allow you to handle more *concurrent* traffic (in fact, it makes performance a tiny bit worse than BIO), but it will allow you to have TONS of idle connections that aren't "wasting" request-processing threads that are just waiting for another actual request to come across the wire. > As a test I changed the following line in one of the many tomcat > instances running on the server and bounced it. Old New protocol="org.apache.coyote.ajp.AjpNioProtocol" redirectPort="8443" > maxThreads="300" /> Yep, that's how to do it. > As the docs state I am able to verify that it did in fact switch > over to NIO. > > INFO: Starting ProtocolHandler ["ajp-nio-9335"] Good. Now you can handle many idle connections with the same number of threads. > Will running NIO and BIO on the same box have a negative impact? No. > I am thinking they should all be switched to NIO, this was just a > test to see if I was understanding what I was reading. I would recommend NIO in all cases. > That being said I suspect there are going to be far more tweaks > that needs to be applied as there are none to date. Hopefully not. A recent Tomcat (which you don't actually have) with a stock configuration should be fairly well-configured to handle a great deal of traffic without falling-over. > I also know that the HTTPD server is running in prefork mode. That will pose some other issues for you, mostly the ability to handle bursts of high concurrency from your clients. You can consider it out-of-scope for this discussion, though. What we want to do for you is stop httpd+Tomcat from freaking out and getting stopped-up with even a small number of users. > Which I think leaves me with no control over how many connections > can be handed back from apache on a site by site basis. Probably not on a site-by-site basis, but you can adjust the connection-pool size on a per-worker basis. For prefork it MUST BE connection_pool_size=1 (the default for prefork httpd) and for "worker" and similarly-threaded MPMs the default should be fine to use. > Really having hard time explaining to others how BIO could have > caused the connection pool for another use to become exhausted. Well... If one of your Tomcats locks-up (database is dead; might want to check to see how the application is accessing that... infinite timeouts can be a real killer, here), it can tie-up connections from mod_proxy_ajp's connection pool. But those connections should be per-worker and shouldn't interfere with each other. Unless you have an uber-worker that handles everything for all those various Tomcats. Can you give us a peek at your worker configuration? You explained it a bit in your first post, but it might be time for some more details... - -chris -BEGIN PGP SIGNATURE- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQJRBAEBCAA7FiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlnVTB0dHGNocmlzQGNo cmlzdG9waGVyc2NodWx0ei5uZXQACgkQHPApP6U8pFjbSw/9EyMkmQHZicehVjGg tueSx+0mgVPEAmrVK/ZFVM2W0zlh4xPjh+O2SS7h3ppfOEB8cMrz3qcmAYEIK8QV vgBieXP7CPo80/TIklJXFz50fl/T4xEGSwtJ91+U9O10Py/0QzJVRqT8ac+EinQH fiCmkJKpxwCHxvUyziTGyT/H9xkb885ElXxG/KnoTwJ/r4Fph18I4kEj+KPA/2Gd
Re: AJP connection pool issue bug?
Perhaps I am not wording my question correctly. Today we have... [Prxoy 1] | [Proxy 2] ---> [Apache ---> tomcat1] (HTTPS) (HTTPS) (HTTPS) --> (AJP) --> So we send the information from the proxies over https to the instance running the tomcat server. The SSL is terminated by Apache/HTTPD and handed back to tomcat over AJP. Tomcat doesn't handle SSL in anyway. It can't, it's not configured to do so. Is that how you understand the question I asked? As a test I changed the following line in one of the many tomcat instances running on the server and bounced it. Old New As the docs state I am able to verify that it did in fact switch over to NIO. INFO: Starting ProtocolHandler ["ajp-nio-9335"] Will running NIO and BIO on the same box have a negative impact? I am thinking they should all be switched to NIO, this was just a test to see if I was understanding what I was reading. That being said I suspect there are going to be far more tweaks that needs to be applied as there are none to date. I also know that the HTTPD server is running in prefork mode. Which I think leaves me with no control over how many connections can be handed back from apache on a site by site basis. Really having hard time explaining to others how BIO could have caused the connection pool for another use to become exhausted. Thanks, TCD On Wed, Oct 4, 2017 at 1:31 PM, Mark Thomaswrote: > On 04/10/17 19:26, TurboChargedDad . wrote: > > My initial reads about BIO vs NIO seems to involve terminating SSL at > the > > tomcat instance. Which we do not do. Am I running off into the weeds > with > > that? > > Yes. The NIO AJP connector is a drop in replacement for the BIO AJP > connector. > > https://tomcat.apache.org/tomcat-7.0-doc/config/ajp. > html#Standard_Implementations > > Look for the protocol attribute. > > Mark > > > > > > Thanks, > > TCD > > > > On Wed, Oct 4, 2017 at 9:17 AM, Mark Thomas wrote: > > > >> On 04/10/17 13:51, TurboChargedDad . wrote: > >>> Hello all.. > >>> I am going to do my best to describe my problem. Hopefully someone > will > >>> have some sort of insight. > >>> > >>> Tomcat 7.0.41 (working on updating that) > >>> Java 1.6 (Working on getting this updated to the latest minor release) > >>> RHEL Linux > >>> > >>> I inherited an opti-tenant setup. Individual user accounts on the > system > >>> each have their own Tomcat instance, each is started using sysinit. > This > >>> is done to keep each website in its own permissible world so one > website > >>> can't interfere with a others data. > >>> > >>> There are two load balanced apache proxies at the edge that point to > one > >>> Tomcat server (I know I know but again I inherited this) > >>> > >>> Apache lays over the top of tomcat to terminate SSL and uses AJP to > >>> proxypass to each tomcat instance based on the users assigned port. > >>> > >>> Things have run fine for years (so I am being told anyway) until > >> recently. > >>> Let me give an example of an outage. > >>> > >>> User1, user2 and user3 all use unique databases on a shared database > >>> server, SQL server 10. > >>> > >>> User 4 runs on a windows jboss server and also has a database on shared > >>> database server 10. > >>> > >>> Users 5-50 all run in the mentioned Linux server using tomcat and have > >>> databases on *other* various shared databases servers but have nothing > to > >>> do with database server 10. > >>> > >>> User 4 had a stored proc go wild on database server 10 basically > knocking > >>> it offline. > >>> > >>> Now one would expect sites 1-4 to experience interruption of service > >>> because they use a shared DBMS platform. However. > >>> > >>> Every single site goes down. I monitor the connections for each site > >> with a > >>> custom tool. When this happens, the connections start stacking up > across > >>> all the components. (Proxies all the way through the stack) > >>> Looking at the AJP connection pool threads for user 9 shows that user > has > >>> exhausted their AJP connection pool threads. They are maxed out at 300 > >> yet > >>> that user doesn't have high activity at all. The CPU load, memory usage > >> and > >>> traffic for everything except SQL server 10 is stable during this > >> outrage. > >>> The proxies start consuming more and more memory the longer the outrage > >>> occurs but that's expected as the connection counts stack up into the > >>> thousands. After a short time all the sites apache / ssl termination > >> later > >>> start throwing AJP timeout errors. Shortly after that the edge proxies > >>> will naturally also starting throwing timeout errors of their own. > >>> > >>> I am only watching user 9 using a tool that allows me to have insight > >> into > >>> what's going on using JMX metrics but I suspect that once I get all the > >>> others instrumented that I will see the same thing. Maxed out AJP > >>> connection pools. > >>> > >>> Aren't those supposed to be
Re: AJP connection pool issue bug?
On 04/10/17 19:26, TurboChargedDad . wrote: > My initial reads about BIO vs NIO seems to involve terminating SSL at the > tomcat instance. Which we do not do. Am I running off into the weeds with > that? Yes. The NIO AJP connector is a drop in replacement for the BIO AJP connector. https://tomcat.apache.org/tomcat-7.0-doc/config/ajp.html#Standard_Implementations Look for the protocol attribute. Mark > > Thanks, > TCD > > On Wed, Oct 4, 2017 at 9:17 AM, Mark Thomaswrote: > >> On 04/10/17 13:51, TurboChargedDad . wrote: >>> Hello all.. >>> I am going to do my best to describe my problem. Hopefully someone will >>> have some sort of insight. >>> >>> Tomcat 7.0.41 (working on updating that) >>> Java 1.6 (Working on getting this updated to the latest minor release) >>> RHEL Linux >>> >>> I inherited an opti-tenant setup. Individual user accounts on the system >>> each have their own Tomcat instance, each is started using sysinit. This >>> is done to keep each website in its own permissible world so one website >>> can't interfere with a others data. >>> >>> There are two load balanced apache proxies at the edge that point to one >>> Tomcat server (I know I know but again I inherited this) >>> >>> Apache lays over the top of tomcat to terminate SSL and uses AJP to >>> proxypass to each tomcat instance based on the users assigned port. >>> >>> Things have run fine for years (so I am being told anyway) until >> recently. >>> Let me give an example of an outage. >>> >>> User1, user2 and user3 all use unique databases on a shared database >>> server, SQL server 10. >>> >>> User 4 runs on a windows jboss server and also has a database on shared >>> database server 10. >>> >>> Users 5-50 all run in the mentioned Linux server using tomcat and have >>> databases on *other* various shared databases servers but have nothing to >>> do with database server 10. >>> >>> User 4 had a stored proc go wild on database server 10 basically knocking >>> it offline. >>> >>> Now one would expect sites 1-4 to experience interruption of service >>> because they use a shared DBMS platform. However. >>> >>> Every single site goes down. I monitor the connections for each site >> with a >>> custom tool. When this happens, the connections start stacking up across >>> all the components. (Proxies all the way through the stack) >>> Looking at the AJP connection pool threads for user 9 shows that user has >>> exhausted their AJP connection pool threads. They are maxed out at 300 >> yet >>> that user doesn't have high activity at all. The CPU load, memory usage >> and >>> traffic for everything except SQL server 10 is stable during this >> outrage. >>> The proxies start consuming more and more memory the longer the outrage >>> occurs but that's expected as the connection counts stack up into the >>> thousands. After a short time all the sites apache / ssl termination >> later >>> start throwing AJP timeout errors. Shortly after that the edge proxies >>> will naturally also starting throwing timeout errors of their own. >>> >>> I am only watching user 9 using a tool that allows me to have insight >> into >>> what's going on using JMX metrics but I suspect that once I get all the >>> others instrumented that I will see the same thing. Maxed out AJP >>> connection pools. >>> >>> Aren't those supposed to be unique per user/ JVM? Am I missing something >> in >>> the docs? >>> >>> Any assistance from the tomcat gods is much appreciated. >> >> TL;DR - Try switching to the NIO AJP connector on Tomcat. >> >> Take a look at this session I just uploaded from TomcatCon London last >> week. You probably want to start around 35:00 and the topic of thread >> exhaustion. >> >> HTH, >> >> Mark >> >> P.S. The other sessions we have are on the way. I plan to update the >> site and post links once I have them all uploaded. >> >> - >> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org >> For additional commands, e-mail: users-h...@tomcat.apache.org >> >> > - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: AJP connection pool issue bug?
My initial reads about BIO vs NIO seems to involve terminating SSL at the tomcat instance. Which we do not do. Am I running off into the weeds with that? Thanks, TCD On Wed, Oct 4, 2017 at 9:17 AM, Mark Thomaswrote: > On 04/10/17 13:51, TurboChargedDad . wrote: > > Hello all.. > > I am going to do my best to describe my problem. Hopefully someone will > > have some sort of insight. > > > > Tomcat 7.0.41 (working on updating that) > > Java 1.6 (Working on getting this updated to the latest minor release) > > RHEL Linux > > > > I inherited an opti-tenant setup. Individual user accounts on the system > > each have their own Tomcat instance, each is started using sysinit. This > > is done to keep each website in its own permissible world so one website > > can't interfere with a others data. > > > > There are two load balanced apache proxies at the edge that point to one > > Tomcat server (I know I know but again I inherited this) > > > > Apache lays over the top of tomcat to terminate SSL and uses AJP to > > proxypass to each tomcat instance based on the users assigned port. > > > > Things have run fine for years (so I am being told anyway) until > recently. > > Let me give an example of an outage. > > > > User1, user2 and user3 all use unique databases on a shared database > > server, SQL server 10. > > > > User 4 runs on a windows jboss server and also has a database on shared > > database server 10. > > > > Users 5-50 all run in the mentioned Linux server using tomcat and have > > databases on *other* various shared databases servers but have nothing to > > do with database server 10. > > > > User 4 had a stored proc go wild on database server 10 basically knocking > > it offline. > > > > Now one would expect sites 1-4 to experience interruption of service > > because they use a shared DBMS platform. However. > > > > Every single site goes down. I monitor the connections for each site > with a > > custom tool. When this happens, the connections start stacking up across > > all the components. (Proxies all the way through the stack) > > Looking at the AJP connection pool threads for user 9 shows that user has > > exhausted their AJP connection pool threads. They are maxed out at 300 > yet > > that user doesn't have high activity at all. The CPU load, memory usage > and > > traffic for everything except SQL server 10 is stable during this > outrage. > > The proxies start consuming more and more memory the longer the outrage > > occurs but that's expected as the connection counts stack up into the > > thousands. After a short time all the sites apache / ssl termination > later > > start throwing AJP timeout errors. Shortly after that the edge proxies > > will naturally also starting throwing timeout errors of their own. > > > > I am only watching user 9 using a tool that allows me to have insight > into > > what's going on using JMX metrics but I suspect that once I get all the > > others instrumented that I will see the same thing. Maxed out AJP > > connection pools. > > > > Aren't those supposed to be unique per user/ JVM? Am I missing something > in > > the docs? > > > > Any assistance from the tomcat gods is much appreciated. > > TL;DR - Try switching to the NIO AJP connector on Tomcat. > > Take a look at this session I just uploaded from TomcatCon London last > week. You probably want to start around 35:00 and the topic of thread > exhaustion. > > HTH, > > Mark > > P.S. The other sessions we have are on the way. I plan to update the > site and post links once I have them all uploaded. > > - > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > For additional commands, e-mail: users-h...@tomcat.apache.org > >
Re: AJP connection pool issue bug?
On 4 October 2017 15:17:25 BST, Mark Thomaswrote: >On 04/10/17 13:51, TurboChargedDad . wrote: >> Hello all.. >> I am going to do my best to describe my problem. Hopefully someone >will >> have some sort of insight. >> >> Tomcat 7.0.41 (working on updating that) >> Java 1.6 (Working on getting this updated to the latest minor >release) >> RHEL Linux >> >> I inherited an opti-tenant setup. Individual user accounts on the >system >> each have their own Tomcat instance, each is started using sysinit. >This >> is done to keep each website in its own permissible world so one >website >> can't interfere with a others data. >> >> There are two load balanced apache proxies at the edge that point to >one >> Tomcat server (I know I know but again I inherited this) >> >> Apache lays over the top of tomcat to terminate SSL and uses AJP to >> proxypass to each tomcat instance based on the users assigned port. >> >> Things have run fine for years (so I am being told anyway) until >recently. >> Let me give an example of an outage. >> >> User1, user2 and user3 all use unique databases on a shared database >> server, SQL server 10. >> >> User 4 runs on a windows jboss server and also has a database on >shared >> database server 10. >> >> Users 5-50 all run in the mentioned Linux server using tomcat and >have >> databases on *other* various shared databases servers but have >nothing to >> do with database server 10. >> >> User 4 had a stored proc go wild on database server 10 basically >knocking >> it offline. >> >> Now one would expect sites 1-4 to experience interruption of >service >> because they use a shared DBMS platform. However. >> >> Every single site goes down. I monitor the connections for each site >with a >> custom tool. When this happens, the connections start stacking up >across >> all the components. (Proxies all the way through the stack) >> Looking at the AJP connection pool threads for user 9 shows that user >has >> exhausted their AJP connection pool threads. They are maxed out at >300 yet >> that user doesn't have high activity at all. The CPU load, memory >usage and >> traffic for everything except SQL server 10 is stable during this >outrage. >> The proxies start consuming more and more memory the longer the >outrage >> occurs but that's expected as the connection counts stack up into the >> thousands. After a short time all the sites apache / ssl termination >later >> start throwing AJP timeout errors. Shortly after that the edge >proxies >> will naturally also starting throwing timeout errors of their own. >> >> I am only watching user 9 using a tool that allows me to have insight >into >> what's going on using JMX metrics but I suspect that once I get all >the >> others instrumented that I will see the same thing. Maxed out AJP >> connection pools. >> >> Aren't those supposed to be unique per user/ JVM? Am I missing >something in >> the docs? >> >> Any assistance from the tomcat gods is much appreciated. > >TL;DR - Try switching to the NIO AJP connector on Tomcat. > >Take a look at this session I just uploaded from TomcatCon London last >week. You probably want to start around 35:00 and the topic of thread >exhaustion. Whoops. Here is the link. https://youtu.be/2QYWp1k5QQM Mark > >HTH, > >Mark > >P.S. The other sessions we have are on the way. I plan to update the >site and post links once I have them all uploaded. > >- >To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org >For additional commands, e-mail: users-h...@tomcat.apache.org - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: AJP connection pool issue bug?
On 04/10/17 13:51, TurboChargedDad . wrote: > Hello all.. > I am going to do my best to describe my problem. Hopefully someone will > have some sort of insight. > > Tomcat 7.0.41 (working on updating that) > Java 1.6 (Working on getting this updated to the latest minor release) > RHEL Linux > > I inherited an opti-tenant setup. Individual user accounts on the system > each have their own Tomcat instance, each is started using sysinit. This > is done to keep each website in its own permissible world so one website > can't interfere with a others data. > > There are two load balanced apache proxies at the edge that point to one > Tomcat server (I know I know but again I inherited this) > > Apache lays over the top of tomcat to terminate SSL and uses AJP to > proxypass to each tomcat instance based on the users assigned port. > > Things have run fine for years (so I am being told anyway) until recently. > Let me give an example of an outage. > > User1, user2 and user3 all use unique databases on a shared database > server, SQL server 10. > > User 4 runs on a windows jboss server and also has a database on shared > database server 10. > > Users 5-50 all run in the mentioned Linux server using tomcat and have > databases on *other* various shared databases servers but have nothing to > do with database server 10. > > User 4 had a stored proc go wild on database server 10 basically knocking > it offline. > > Now one would expect sites 1-4 to experience interruption of service > because they use a shared DBMS platform. However. > > Every single site goes down. I monitor the connections for each site with a > custom tool. When this happens, the connections start stacking up across > all the components. (Proxies all the way through the stack) > Looking at the AJP connection pool threads for user 9 shows that user has > exhausted their AJP connection pool threads. They are maxed out at 300 yet > that user doesn't have high activity at all. The CPU load, memory usage and > traffic for everything except SQL server 10 is stable during this outrage. > The proxies start consuming more and more memory the longer the outrage > occurs but that's expected as the connection counts stack up into the > thousands. After a short time all the sites apache / ssl termination later > start throwing AJP timeout errors. Shortly after that the edge proxies > will naturally also starting throwing timeout errors of their own. > > I am only watching user 9 using a tool that allows me to have insight into > what's going on using JMX metrics but I suspect that once I get all the > others instrumented that I will see the same thing. Maxed out AJP > connection pools. > > Aren't those supposed to be unique per user/ JVM? Am I missing something in > the docs? > > Any assistance from the tomcat gods is much appreciated. TL;DR - Try switching to the NIO AJP connector on Tomcat. Take a look at this session I just uploaded from TomcatCon London last week. You probably want to start around 35:00 and the topic of thread exhaustion. HTH, Mark P.S. The other sessions we have are on the way. I plan to update the site and post links once I have them all uploaded. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
AJP connection pool issue bug?
Hello all.. I am going to do my best to describe my problem. Hopefully someone will have some sort of insight. Tomcat 7.0.41 (working on updating that) Java 1.6 (Working on getting this updated to the latest minor release) RHEL Linux I inherited an opti-tenant setup. Individual user accounts on the system each have their own Tomcat instance, each is started using sysinit. This is done to keep each website in its own permissible world so one website can't interfere with a others data. There are two load balanced apache proxies at the edge that point to one Tomcat server (I know I know but again I inherited this) Apache lays over the top of tomcat to terminate SSL and uses AJP to proxypass to each tomcat instance based on the users assigned port. Things have run fine for years (so I am being told anyway) until recently. Let me give an example of an outage. User1, user2 and user3 all use unique databases on a shared database server, SQL server 10. User 4 runs on a windows jboss server and also has a database on shared database server 10. Users 5-50 all run in the mentioned Linux server using tomcat and have databases on *other* various shared databases servers but have nothing to do with database server 10. User 4 had a stored proc go wild on database server 10 basically knocking it offline. Now one would expect sites 1-4 to experience interruption of service because they use a shared DBMS platform. However. Every single site goes down. I monitor the connections for each site with a custom tool. When this happens, the connections start stacking up across all the components. (Proxies all the way through the stack) Looking at the AJP connection pool threads for user 9 shows that user has exhausted their AJP connection pool threads. They are maxed out at 300 yet that user doesn't have high activity at all. The CPU load, memory usage and traffic for everything except SQL server 10 is stable during this outrage. The proxies start consuming more and more memory the longer the outrage occurs but that's expected as the connection counts stack up into the thousands. After a short time all the sites apache / ssl termination later start throwing AJP timeout errors. Shortly after that the edge proxies will naturally also starting throwing timeout errors of their own. I am only watching user 9 using a tool that allows me to have insight into what's going on using JMX metrics but I suspect that once I get all the others instrumented that I will see the same thing. Maxed out AJP connection pools. Aren't those supposed to be unique per user/ JVM? Am I missing something in the docs? Any assistance from the tomcat gods is much appreciated. Thanks in advance. TCD