Is it possible to run this server with a basic tomcat application under load
to rule out the application causing the crash?

On Fri, Feb 12, 2010 at 4:20 AM, Carl <c...@etrak-plus.com> wrote:

> This problem continues to plague me.
>
> A quick recap so you don't have to search your memory or archives.
>
> The 10,000 foot view:  new Dell T105 and T110, Slackware 13.0 (64 bit),
> latest Java (64 bit) and latest Tomcat.  Machines only run Tomcat and a
> small, special purpose Java server (which I have also moved to another
> machine to make certain it wasn't causing any problems.)  Periodically,
> Tomcat just dies leaving no tracks in any log that I have been able to find.
> The application has run on a Slackware 12.1 (32 bit) for several years
> without problems (except for application bugs.)  I have run memTest86 for 30
> hours on the T110 with no problems reported.
>
> More details: the Dell 105 has an AMD processor and (currently) 8 GB
> memory. The T110 has a Xeon 3440 processor and 4 GB memory.  The current
> Java version is 1.6.0_18-b07.  The current Tomcat version is 6.0.24.
>
> The servers are lightly loaded with less than 100 sessions active at any
> one time.
>
> All of the following trials have produced the same results:
>
> 1.  Tried openSuse 64 bit.
>
> 2.  Tried 32 bit Slackware 13.
>
> 3.  Increased the memory in the T105 from 4GB to 6 GB and finally to 8 GB.
>
> 4.  Have fiddled with the JAVA_OPTS settings in catalina.sh.  The current
> settings are:
>
> JAVA_OPTS="-Xms512m -Xmx512m -XX:PermSize=384m -XX:MaxPermSize=384m
> -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:+PrintGCDetails
> -XX:+PrintGCTimeStamps -XX:+HeapDumpOnOutOfMemoryError
> -XX:HeapDumpPath=/usr/local/tomcat/logs"
>
> I can see the incremental GC effects in both catalina.out and VisualJVM.
> Note the fairly small (512MB) heap but watching VisualJVM indicates this is
> sufficient (when a failure occurs, VisualJVM will report the last amount of
> memory used and this is always well under the max in both heap and permGen.)
>
> More information about the failures:
>
> 1.  They are clean kills as I can restart Tomcat immediately after failure
> and there is no port conflict.  As I understand it, this implies the linux
> process was killed (I have manually killed the java process with kill -9 and
> had the same result that I have observed when the system fails) or Tomcat
> was shut down normally, e.g., using shutdown.sh (this always leaves tracks
> in catalina.out and I am not seeing any so I do not believe this is the
> case.)
>
> 2.  They appear to be load related.  On heavy processing days, the system
> might fail every 15 minutes but it could also run for up to 10 days without
> failure but with lighter processing.  I have found a way to force a more
> frequent failure.  We have four war's deployed (I will call them A, B, C and
> D.)  They are all the same application but we use this process to enable
> access to different databases.  A user accesses the correct application by
> https://xx.com/A or B, etc.  A is used for production while the others
> have specific purposes.  Thus, A is always used while the others are used
> periodically.  If users start coming in on B, C and/or D, within hours the
> failure occurs (Tomcat shuts down bringing all of the users down, of
> course.)  Note that the failure still does not happen immediately.
>
> 3.  They do not appear to be caused by memory restrictions as 1) the old
> server had only 2 GB of memory and ran well, 2) I have tried adding memory
> to the new servers with no change in behavior and 3) the indications from
> top and the Slackware system monitor are that the system is not starved for
> memory.  In fact, yesterday, running on the T105 with 8 GB of memory, top
> never reported over 6 GB being used (0 swap being used) yet it failed at
> about 4:00PM.
>
> 4.  Most of the failures will occur after some amount of processing.  We
> update the war's and restart the Tomcats each morning at 1:00AM.  Most of
> the failures will occur toward the end of the day although heavy processing
> (or using multiple 'applications') may force it to happen earlier (the
> earliest failure has been around 1:00PM... it was the heaviest processing
> day ever.)  It is almost as if there is a bucket somewhere that gets filled
> up and, when filled, causes the failure.  (So there is no misunderstanding,
> there has never been an OOM condition reported anywhere that I can find.)
>
> Observations (or random musings):
>
> The fact that the failures occur after some amount of processing implies
> that the issue is related to memory usage, and, potentially, caused by a
> memory leak in the application.  However, 1) I have never seen (from
> VisualJVM) any issue with either heap or permGen and the incremental GC's
> reported in catalina.out look pretty normal and 2) top, vmstat, system
> monitor, etc. are not showing any issues with memory.
>
> The failures look a lot like the linux OOM killer (which Mark or Chris said
> way back at the beginning which is now 2-3 months ago.)   Does anyone have
> an idea where I could get information on tracking the linux signals that
> could cause this?
>
> Thanks,
>
> Carl
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org
>
>

Reply via email to