This message is long.  Lots of details, a fair amount of history.

The primary Tomcat version we've got is 7.0.42. Specifically, it is the Tomcat that's included with Liferay 6.2. This is why we haven't attempted an upgrade even though the version we're running is five years old -- we don't want to rock the boat too hard and risk everything not working.  Liferay is battle-tested with that version of Tomcat.

Background:

Every now and then, we find that our MySQL master has reached the connection limit of 600 connections, and websites stop working, because they are trying to open new connections and failing.  When I look at the DB server, it shows a HUGE number of connections (300 to 500 out of the 600 max) that have idle times between one hour and eight hours.  The MySQL server-side idle timeout is at the default of 28800 seconds -- eight hours.  The idle connections are coming from the webservers.

There are five main webservers, two of which run exclusively Liferay 6.2, and three that are running older apps on Tomcat version that's probably ANCIENT, as well as Liferay 6.1 with Tomcat 7.0.23.  I can't seem to figure out what version the really ancient one is.  There are no version numbers that I can see in the files in the program directory.  We also have a staging server, a dev server, and a few ancillary systems.  The staging and dev servers mainly use a dev/staging database server, but do need to connect to the main DB server for some things.

When we run out of connections, each of these five webservers has nearly 100 (and sometimes MORE than 100) open connections to the database server.  And the majority of them have been idle for a LONG time.

A number of JNDI database resources are configured in conf/context.xml.  So we have connection pooling.  But even though there are TONS of connections already established from Tomcat, and completely idle from the DB server's perspective, the application is STILL trying to open a NEW connection when somebody visits a page.  This is the precise issue that connection pooling SHOULD be preventing.

I've discussed this with the commons-user mailing list, from a DBCP perspective.  I started there because there is some different code that I wrote, which doesn't run in Tomcat, and uses DBCP2 natively.  I'm a lot more familiar with my code than the code running in Tomcat.

Based on my discussions with commons-user, I really think that the reason that the DB pools are trying to create more connections even though there's already a lot of them open is because the connections are actually abandoned.  I suspect they were never closed by the application, so the pool still has them as active, and thinks it can't use them.  Now that the discussion directly involves configuring pools using Tomcat's own DBCP implementation, they're not really able to help me any further.

If we are dealing with abandoned connections as I suspect, then I need to ask why abandoned connection removal isn't actually working, and how to configure it so that it DOES work.

We have figured out how to log the number of idle and active connections on the datasource, but it's going to take some time to get that logging into the production servers, so I don't have definitive proof that the connections are actually active.

Here's a redacted configuration that we have in place for one of the DB connection pools on the production servers:

        <Resource name="jdbc/REDACTED" auth="Container" factory="org.apache.tomcat.dbcp.dbcp.BasicDataSourceFactory" driverClassName="com.mysql.jdbc.Driver" type="javax.sql.DataSource" maxActive="60" maxIdle="10" maxWait="30000" removeAbandoned="true" removeAbandonedTimeout="30" username="REDACTED" password="REDACTED" testOnBorrow="true" validationQuery="select 1" url="jdbc:mysql://REDACTED.REDACTED.com:3306/REDACTED?autoReconnect=true&amp;zeroDateTimeBehavior=round" />

This is the Tomcat documentation I'm using as a reference:

https://tomcat.apache.org/tomcat-7.0-doc/jdbc-pool.html

Looking at Tomcat 7.0 jdbc documentation, the first thing that I notice is that the factory we have configured is not even mentioned in the documentation.

One person (who replied privately and not to commons-user) said that BOTH factory and type attribute values mentioned in the Tomcat 7.0 docs are completely wrong!  And another (who did reply via the list) said that the factory mentioned in the documentation is "the alternative pool" and that I should be using the one documented here instead.  But this page is REALLY lean on details of how to actually set up the resource:

https://tomcat.apache.org/tomcat-7.0-doc/jndi-datasource-examples-howto.html#Database_Connection_Pool_(DBCP)_Configurations

=====================================

So now, with the sordid past covered, I have three main questions:

1) I think this is the really burning question in my mind: Why is the server opening NEW connections when there are dozens of them already open?  Does this mean that (as I suspect) all those connections are abandoned?  If so, why are they not getting cleaned up, when we have removeAbandoned set to true and removeAbandonedTimeout set to a RIDICULOUSLY low value of 30 seconds?  (We MUST increase this value.  The plan is one hour.)

2, background) We tried to reconfigure all the pools on our staging server to a new config.  They currently have a config much like what I shared above, but with smaller pool size numbers. Here's an example of what we tried to put in place:

        <Resource name="jdbc/REDACTED"
                        auth="Container"
factory="org.apache.tomcat.jdbc.pool.DataSourceFactory"
driverClassName="com.mysql.jdbc.Driver"
                        type="javax.sql.DataSource"
                        maxActive="10"
                        maxIdle="2"
                        minIdle="1"
                        maxAge="7200000"
                        removeAbandoned="true"
                        removeAbandonedTimeout="3600"
                        logAbandoned="true"
                        username="REDACTED"
                        password="REDACTED"
                        testOnBorrow="true"
                        testOnConnect="true"
                        testOnReturn="true"
                        testWhileIdle="true"
                        validationQuery="SELECT 1"
                        validationQueryTimeout="5"
                        validationInterval="5000"
                        minEvictableIdleTimeMillis="900000"
                        allowMultiQueries="false"
url="jdbc:mysql://REDACTED.REDACTED.com:3306/REDACTED?autoReconnect=true&amp;zeroDateTimeBehavior=round"
                        />

When we tried to start the service with a config like this for all of the pools (there are half a dozen of them), it failed to correctly start, getting too many connection errors.

The size of the pools on this staging server is a LOT smaller than what we have configured in production.  The DB server it connects to is a lot less beefy, and has a limit of 400 connections.  I am told by the developer that did the restart that when this service started up, it had made 140 connections to the dev/staging DB server before failing.  Which shouldn't be possible, with half a dozen pool configs that all have a maxActive set to 10.

2, question) Have I got something wrong in the new config? If I have, and there are recommendations for what I should change, please tell me WHY each change is recommended, and if possible, point me at relevant documentation.  If you can, tell me how I may have misread the Tomcat documentation, which I used extensively in coming up with the new config.

Side note/question:  One of the replies I got on the DBCP list referred me to this Javadoc:

https://commons.apache.org/proper/commons-dbcp/api-1.4/org/apache/commons/dbcp/BasicDataSource.html#getRemoveAbandoned()

Which basically says that the 1.4 version of the standard DBCP library won't even try to eliminate abandoned connections on a pool of size 60 until there are more than 57 connections open on the one pool.  This sounds like a REALLY bad idea.  The Tomcat jdbc documentation lists "abandonWhenPercentageFull" as an attribute, and the description there hints that Tomcat's jdbc implementation doesn't suffer from the same problem.  But with the unusual factory we have configured, I don't know if our configuration is even properly honored.  Does the tomcat pool have that same issue?

Because we have multiple pools defined, and some of those pools have a maxActive number that's pretty large, I'm betting that we run out of connections long before that one pool has opened 57 of them.

Thanks,
Shawn


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to