Anthony,

I have to get through this as quickly as possible and I have never been able to rig up a stress test that duplicates what I am seeing in production so I am basically using the production servers for working the problem out. When a server fails, I just redirect the traffic to the other server and try to analyze what happened. And, if I can't keep the new servers up, I just move back to the old server (thank goodness I didn't rebuild that one when the new ones seemed to work.)

Thanks,

Carl

----- Original Message ----- From: "Anthony J. Biacco" <abia...@formatdynamics.com>
To: "Tomcat Users List" <users@tomcat.apache.org>
Sent: Saturday, February 13, 2010 4:08 PM
Subject: RE: Tomcat dies suddenly


If #1 is correct maybe you should just revert back until you can do more testing outside production. Of course that's only if you're not using some tomcat 6/java 1.6 specific features for your apps

-Tony

Sent from my Windows® phone.

-----Original Message-----
From: André Warnier <a...@ice-sa.com>
Sent: Saturday, February 13, 2010 1:56 PM
To: Tomcat Users List <users@tomcat.apache.org>
Subject: Re: Tomcat dies suddenly

Carl wrote:
Chris,

I find it hard to believe two brand new machines with different
processors, etc. would have a hardware problem that showed itself in
exactly the same way.  Further, I have run memTest86 for 30 hours on one
of the servers and it showed nothing (although, as Chuck pointed out,
the test may not have handled the cores correctly or may not have
changed the temperature sufficiently to cause the problem we are
seeing.)  I have not found a mem test specifically for 64 bit processors.

Right.
After rescanning your posts (and feel free to correct any
discrepancies), here is a summary :

1) you never saw this issue under a previous JVM 1.5 and Tomcat version
5.5.x

2) the problem happens on two separate servers, which seems to rule out
a common server hardware issue

3) it happens under different versions of Linux, which seems to rule out
a problem with one particular Linux distribution

4) it seems to be a SegFault in the JVM, leaving a core dump but no
traces in the logs.
(which SegFaults in my experience happen usually when trying to execute
something which is not valid executable code for the platform at hand)
Anyway, it does not seem to be due to running out of some resource, nor
to a hidden call to system.exit().

5) not quite sure of this anymore, but it seems to happen also on
different JVMs, which would tend to rule out a problem with a particular
JVM port.

6) it does not happen immediately, not in any obvious way related to
what is being processsed, except that it seems to happen more readily
under load

7) it is obviously not a common problem with either JVM or Tomcat, or we
would have had laments from others by now

8) I don't know how a Java/Tomcat webapp application could trigger a
SegFault on its own, other than by having the JVM participate in it.
And apparently your apps are working fine up to the moment of the sudden
death, so for once they do not appear as being among the usual suspects.

9) This, in one of your earlier posts, triggered my curiosity :
quote
This Tomcat is straight out of the box except for some modifications to
JAVA_OPTS in tomcat/bin/catalina.sh (NDLR: canonically, a better place
would be setenv.sh) and opening up ports and turning on SSL in
tomcat/conf/server.xml.
unquote

So, maybe two suggestions, taking into account that I am just making
wild guesses here (but that's pretty much what everyone by now is doing
too, so I don't feel too bad) :

- have you tried running Tomcat from the command-line, with
STDOUT/STDERR to the console ?  Maybe something shows up there which
doesn't show up anywhere else ?

- what about this SSL ? that just seems to me a likely candidate for
something that is maybe not used all the time, probably calls stuff
which should be native code, and is usually provided separately from Tomcat.
Can you turn it off and still be operational ?
Also, if it is provided separately, it should probably be relatively
"grouped" in some directory, making it easier to check if everything is
as it should be.

Note also that apart from a direct hardware similarity between the
servers on which it happens, another common element seems to be the
place at which it happens, namely the server room.  This is a long shot,
but a power supply issue may also provoke hardware failures.  Or if your
server room is on top of a mountain, or near a particle accelerator ?
(re relativistic gamma rays, dark energy and all that stuff).
;-)


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to