Re: Tomcat dies suddenly

Carl Fri, 12 Feb 2010 04:22:36 -0800

This problem continues to plague me.

A quick recap so you don't have to search your memory or archives.

The 10,000 foot view: new Dell T105 and T110, Slackware 13.0 (64 bit),latest Java (64 bit) and latest Tomcat. Machines only run Tomcat and asmall, special purpose Java server (which I have also moved to anothermachine to make certain it wasn't causing any problems.) Periodically,Tomcat just dies leaving no tracks in any log that I have been able to find.The application has run on a Slackware 12.1 (32 bit) for several yearswithout problems (except for application bugs.) I have run memTest86 for 30hours on the T110 with no problems reported.

More details: the Dell 105 has an AMD processor and (currently) 8 GB memory.The T110 has a Xeon 3440 processor and 4 GB memory. The current Javaversion is 1.6.0_18-b07. The current Tomcat version is 6.0.24.

The servers are lightly loaded with less than 100 sessions active at any onetime.


All of the following trials have produced the same results:

1.  Tried openSuse 64 bit.

2.  Tried 32 bit Slackware 13.

3.  Increased the memory in the T105 from 4GB to 6 GB and finally to 8 GB.

4. Have fiddled with the JAVA_OPTS settings in catalina.sh. The currentsettings are:

JAVA_OPTS="-Xms512m -Xmx512m -XX:PermSize=384m -XX:MaxPermSize=384m -XX:+UseConcMarkSweepGC-XX:+CMSIncrementalMode -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+HeapDumpOnOutOfMemoryError-XX:HeapDumpPath=/usr/local/tomcat/logs"

I can see the incremental GC effects in both catalina.out and VisualJVM.Note the fairly small (512MB) heap but watching VisualJVM indicates this issufficient (when a failure occurs, VisualJVM will report the last amount ofmemory used and this is always well under the max in both heap and permGen.)


More information about the failures:

1. They are clean kills as I can restart Tomcat immediately after failureand there is no port conflict. As I understand it, this implies the linuxprocess was killed (I have manually killed the java process with kill -9 andhad the same result that I have observed when the system fails) or Tomcatwas shut down normally, e.g., using shutdown.sh (this always leaves tracksin catalina.out and I am not seeing any so I do not believe this is thecase.)

2. They appear to be load related. On heavy processing days, the systemmight fail every 15 minutes but it could also run for up to 10 days withoutfailure but with lighter processing. I have found a way to force a morefrequent failure. We have four war's deployed (I will call them A, B, C andD.) They are all the same application but we use this process to enableaccess to different databases. A user accesses the correct application byhttps://xx.com/A or B, etc. A is used for production while the others havespecific purposes. Thus, A is always used while the others are usedperiodically. If users start coming in on B, C and/or D, within hours thefailure occurs (Tomcat shuts down bringing all of the users down, ofcourse.) Note that the failure still does not happen immediately.

3. They do not appear to be caused by memory restrictions as 1) the oldserver had only 2 GB of memory and ran well, 2) I have tried adding memoryto the new servers with no change in behavior and 3) the indications fromtop and the Slackware system monitor are that the system is not starved formemory. In fact, yesterday, running on the T105 with 8 GB of memory, topnever reported over 6 GB being used (0 swap being used) yet it failed atabout 4:00PM.

4. Most of the failures will occur after some amount of processing. Weupdate the war's and restart the Tomcats each morning at 1:00AM. Most ofthe failures will occur toward the end of the day although heavy processing(or using multiple 'applications') may force it to happen earlier (theearliest failure has been around 1:00PM... it was the heaviest processingday ever.) It is almost as if there is a bucket somewhere that gets filledup and, when filled, causes the failure. (So there is no misunderstanding,there has never been an OOM condition reported anywhere that I can find.)


Observations (or random musings):

The fact that the failures occur after some amount of processing impliesthat the issue is related to memory usage, and, potentially, caused by amemory leak in the application. However, 1) I have never seen (fromVisualJVM) any issue with either heap or permGen and the incremental GC'sreported in catalina.out look pretty normal and 2) top, vmstat, systemmonitor, etc. are not showing any issues with memory.

The failures look a lot like the linux OOM killer (which Mark or Chris saidway back at the beginning which is now 2-3 months ago.) Does anyone havean idea where I could get information on tracking the linux signals thatcould cause this?


Thanks,

Carl




---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Re: Tomcat dies suddenly

Reply via email to