Hello, I'd rather narrow down the problem a bit more before upgrading (to make sure that that's the issue).
It happened again, this time I took a thread dump of the instance that went down (actually, it's only the DB connection pool that becomes unresponsive but I remove the instance from my load balancer when this happens). The thread dump shows pretty much the same things that the logs showed when I shutdown tomcat: a very long list of a lot of stack traces which all get stuck when trying to access the connection pool. An example from the thread dump: "http-apr-8080-exec-805" #840 daemon prio=5 os_prio=0 tid=0x00007f50d00c1800 nid=0x5fbc waiting on condition [0x00007f50c62e7000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000000ec44cf38> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at org.apache.tomcat.dbcp.pool2.impl.LinkedBlockingDeque.takeFirst(LinkedBlockingDeque.java:582) at org.apache.tomcat.dbcp.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:439) at org.apache.tomcat.dbcp.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:360) at org.apache.tomcat.dbcp.dbcp2.PoolingDataSource.getConnection(PoolingDataSource.java:118) at org.apache.tomcat.dbcp.dbcp2.BasicDataSource.getConnection(BasicDataSource.java:1412) Questions: 1) What order does a thread dump print in? That is, I'd like to know what the first thread was in my thread dump (temporally) so that I might know which thread caused the actual issue. Is this the beginning of the thread dump or the end? 2) The thread dump was so long that I can't see the beginning of it. Does anyone know how to get the thread dump in bits and pieces so the beginning doesn't get cut off when it's long? Possible causes: Is it possible that someone is acting maliciously and intentionally hitting certain pages many times to cause the pool to lock up? I think this because if I just take a thread dump when my application is functioning, then it doesn't show anything suspicious. This leads me to believe that the problem is not cumulative but rather happens all at once. Then when I take a look at the problematic thread dump and see so many stack traces that access my database, it leads me to think that they are accessed all at once at the time that the pool locks up. Any thoughts? Thanks a lot. _ On Mon, Sep 12, 2016 at 9:37 PM, Mark Thomas <ma...@apache.org> wrote: > On 12/09/2016 19:02, Yuval Schwartz wrote: > > Hey Mark, thanks a lot. > > > > On Mon, Sep 12, 2016 at 4:42 PM, Mark Thomas <ma...@apache.org> wrote: > > > >> On 12/09/2016 11:54, Yuval Schwartz wrote: > > <snip/> > > >> It might also be a bug in the connection pool that has been fixed. > >> Upgrading to the latest 8.0.x (or better still the latest 8.5.x) should > >> address that. > >> > > > > I'll look for a bug report on this (although I haven't found anything as > of > > yet). > > There was one around XA connections that could result in a leak. The > others, you'd need to dig into the DBCP change log and Jira for. > > > I wouldn't mind upgrading but do you think this could be a bug? I've been > > running my application with this setup for about 8 months; the problem > only > > started in the last week. > > That begs the question what changed a week ago? > > Mark > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > For additional commands, e-mail: users-h...@tomcat.apache.org > >