First thing to do is thank you again for taking the time to help me. 
Apache has great communities.

On 3/22/2018 5:38 PM, Phil Steitz wrote:
> You must be looking at documentation describing how to use the
> alternative pool mentioned above (tomcat-jdbc).  The config you
> posted is correct for DBCP.

I'm looking at Tomcat documentation.

https://tomcat.apache.org/tomcat-7.0-doc/jdbc-pool.html

The tomcat is the one included with Liferay 6.2.  It is 7.0.42.

> Don't look at DBCP 2 code for troubleshooting the code you are
> running.  Either look at the repackaged sources inside the tomcat
> source, or find the version in the tomcat build files and go to the
> old DBCP / pool sources referenced there.

I have figured that out.  Felt pretty dumb when I realized I wasn't
looking at code from the correct Tomcat version.

> Of course, you *should* upgrade both TC and the DBCP it ships so you
> *can* look at that (much better) code.  See below for one reason why.

I hear what you're saying, and don't disagree ... but this is not the
kind of environment where I can just do an upgrade, even though
upgrading might make it all better.

We didn't download Tomcat -- we downloaded Liferay, which came with a
specific version of Tomcat already included.  Upgrading any significant
component (liferay, tomcat, and others) runs the risk that when we
restart the service, our web application won't work any more.  For any
upgrade, we have to spend a lot of resources trying the upgrade in a
staging environment, so we can be sure that everything still works. 
Because that's very time-consuming, we tend to not do a lot of
upgrading, at least of significant components, and our versions get
REALLY old.

This is also why I'm hesitant to move away from Tomcat's DBCP
implementation to Commons DBCP (particularly version 2), even though
that's exactly what I want to do.  Switching to a different library
might work seamlessly ... or it might completely break the application. 
Our customers get REALLY irritated when the websites we've built for
them don't work!

> One thing that could be going on is that in the old 1.x DBCP, 
> abandoned connection removal only happens when borrows are
> attempted.  So if you check out a lot of connections, abandon them
> and don't ask for more, they won't get closed as abandoned until you
> borrow a new one.  In DBCP 2, the removeAbandoned property is split
> into two different properties:  removeAbandonedOnBorrow (the old
> behavior) and removeAbandonedOnMaintenance.  The second one makes
> abandoned connection removal run on pool maintenance (so will not
> have to wait until a borrow is attempted).

I don't know if anyone needs me to actually back up and describe what's
happening that led me down this rabbit hole, but that's what I'm going
to do:

The master MySQL server in our environment has a max connection limit
configured at 600 connections.

Every now and then, we start getting website failures, because all the
connections are in use and the connection pools can't make any more
connections.  Looking at the connections on the MySQL side, the vast
majority are idle (the command is "Sleep" on the server processlist),
and have been idle for several hours.

There are five main webservers and a handful of ancillary systems that
also connect to the database.  When the problem happens, the connection
count from each webserver has gotten up near 100, and sometimes over
100.  The surplus of connections are definitely the ones configured in
Tomcat.  Liferay has its own DB config for its own data (using c3p0 for
pooling), and although I often see a higher number of connections to
that database than I would excpect, I've never seen the idle time on
those connections above one minute, so I'm not concerned about that
pool, beyond some minor tweaks.

The frequency of the connection-related failures has been increasing, so
in response, I have set up monitoring that will send us an alarm when
the server reaches 550 connections.  This has allowed us to kill idle
connections and prevent customer-visible problems a couple of times
already, but we still have a fundamental issue to correct.

I do not yet have any information that indicates whether Tomcat's DBCP
thinks those connections are idle or active.  I have reason to suspect
that they are active, and have not been returned to the pool (closed). 
I've worked out a way with one of our developers to add logging that
displays the active and idle connection counts, but it's not yet in
production.  If those connections were idle, as the MySQL server thinks
they are, it really seems like DBCP would be choosing to re-use a
connection that it's already got, instead of trying to create a brand
new one and failing.

So I am chasing abandoned connection removal.  We have it configured,
but it's not working.  The config is lacking things I think it needs,
but as far as I could tell, there is enough for abandoned connection
removal to work.  I suspect it's not working because I'm using a
different factory than the documentation says I should be using ... or
because the config we've got (which I inherited and did not create) is
incorrect.  I acknowledge that the problem might be a bug in
tomcat-dbcp, one that upgrading might fix.

The Resource configuration I shared most recently is what I'm *planning*
to put in place.  This is the config I inherited that we actually have
in place now (slightly redacted):

        <Resource name="jdbc/REDACTED" auth="Container"
factory="org.apache.tomcat.dbcp.dbcp.BasicDataSourceFactory"
driverClassName="com.mysql.jdbc.Driver" type="javax.sql.DataSource"
maxActive="60" maxIdle="10" maxWait="30000" removeAbandoned="true"
removeAbandonedTimeout="30" username="REDACTED" password="REDACTED"
testOnBorrow="true" validationQuery="select 1"
url="jdbc:mysql://encore.REDACTED.com:3306/REDACTED?autoReconnect=true&amp;zeroDateTimeBehavior=round"
/>

The removeAbandonedTimeout setting in that config is at 30 seconds.  If
this were actually working, I think we'd have a lot of very irritated
customers, unable to run reports, which usually take a few minutes to
get results.

If abandoned connection removal only occurs at time of borrow with the
older version, I don't think that's a problem.  Accessing the pages that
fail when the problem happens does require at least one database
lookup.  In the code, the database lookup starts with a
"ds.getConnection()" call.  That's a borrow, isn't it?

Thanks,
Shawn


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to