First thing to do is thank you again for taking the time to help me. Apache has great communities.
On 3/22/2018 5:38 PM, Phil Steitz wrote: > You must be looking at documentation describing how to use the > alternative pool mentioned above (tomcat-jdbc). The config you > posted is correct for DBCP. I'm looking at Tomcat documentation. https://tomcat.apache.org/tomcat-7.0-doc/jdbc-pool.html The tomcat is the one included with Liferay 6.2. It is 7.0.42. > Don't look at DBCP 2 code for troubleshooting the code you are > running. Either look at the repackaged sources inside the tomcat > source, or find the version in the tomcat build files and go to the > old DBCP / pool sources referenced there. I have figured that out. Felt pretty dumb when I realized I wasn't looking at code from the correct Tomcat version. > Of course, you *should* upgrade both TC and the DBCP it ships so you > *can* look at that (much better) code. See below for one reason why. I hear what you're saying, and don't disagree ... but this is not the kind of environment where I can just do an upgrade, even though upgrading might make it all better. We didn't download Tomcat -- we downloaded Liferay, which came with a specific version of Tomcat already included. Upgrading any significant component (liferay, tomcat, and others) runs the risk that when we restart the service, our web application won't work any more. For any upgrade, we have to spend a lot of resources trying the upgrade in a staging environment, so we can be sure that everything still works. Because that's very time-consuming, we tend to not do a lot of upgrading, at least of significant components, and our versions get REALLY old. This is also why I'm hesitant to move away from Tomcat's DBCP implementation to Commons DBCP (particularly version 2), even though that's exactly what I want to do. Switching to a different library might work seamlessly ... or it might completely break the application. Our customers get REALLY irritated when the websites we've built for them don't work! > One thing that could be going on is that in the old 1.x DBCP, > abandoned connection removal only happens when borrows are > attempted. So if you check out a lot of connections, abandon them > and don't ask for more, they won't get closed as abandoned until you > borrow a new one. In DBCP 2, the removeAbandoned property is split > into two different properties: removeAbandonedOnBorrow (the old > behavior) and removeAbandonedOnMaintenance. The second one makes > abandoned connection removal run on pool maintenance (so will not > have to wait until a borrow is attempted). I don't know if anyone needs me to actually back up and describe what's happening that led me down this rabbit hole, but that's what I'm going to do: The master MySQL server in our environment has a max connection limit configured at 600 connections. Every now and then, we start getting website failures, because all the connections are in use and the connection pools can't make any more connections. Looking at the connections on the MySQL side, the vast majority are idle (the command is "Sleep" on the server processlist), and have been idle for several hours. There are five main webservers and a handful of ancillary systems that also connect to the database. When the problem happens, the connection count from each webserver has gotten up near 100, and sometimes over 100. The surplus of connections are definitely the ones configured in Tomcat. Liferay has its own DB config for its own data (using c3p0 for pooling), and although I often see a higher number of connections to that database than I would excpect, I've never seen the idle time on those connections above one minute, so I'm not concerned about that pool, beyond some minor tweaks. The frequency of the connection-related failures has been increasing, so in response, I have set up monitoring that will send us an alarm when the server reaches 550 connections. This has allowed us to kill idle connections and prevent customer-visible problems a couple of times already, but we still have a fundamental issue to correct. I do not yet have any information that indicates whether Tomcat's DBCP thinks those connections are idle or active. I have reason to suspect that they are active, and have not been returned to the pool (closed). I've worked out a way with one of our developers to add logging that displays the active and idle connection counts, but it's not yet in production. If those connections were idle, as the MySQL server thinks they are, it really seems like DBCP would be choosing to re-use a connection that it's already got, instead of trying to create a brand new one and failing. So I am chasing abandoned connection removal. We have it configured, but it's not working. The config is lacking things I think it needs, but as far as I could tell, there is enough for abandoned connection removal to work. I suspect it's not working because I'm using a different factory than the documentation says I should be using ... or because the config we've got (which I inherited and did not create) is incorrect. I acknowledge that the problem might be a bug in tomcat-dbcp, one that upgrading might fix. The Resource configuration I shared most recently is what I'm *planning* to put in place. This is the config I inherited that we actually have in place now (slightly redacted): <Resource name="jdbc/REDACTED" auth="Container" factory="org.apache.tomcat.dbcp.dbcp.BasicDataSourceFactory" driverClassName="com.mysql.jdbc.Driver" type="javax.sql.DataSource" maxActive="60" maxIdle="10" maxWait="30000" removeAbandoned="true" removeAbandonedTimeout="30" username="REDACTED" password="REDACTED" testOnBorrow="true" validationQuery="select 1" url="jdbc:mysql://encore.REDACTED.com:3306/REDACTED?autoReconnect=true&zeroDateTimeBehavior=round" /> The removeAbandonedTimeout setting in that config is at 30 seconds. If this were actually working, I think we'd have a lot of very irritated customers, unable to run reports, which usually take a few minutes to get results. If abandoned connection removal only occurs at time of borrow with the older version, I don't think that's a problem. Accessing the pages that fail when the problem happens does require at least one database lookup. In the code, the database lookup starts with a "ds.getConnection()" call. That's a borrow, isn't it? Thanks, Shawn --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
