Re: Help in diagnosing server unresponsiveness

Igor Cicimov Tue, 05 Feb 2013 19:15:47 -0800

On Wed, Feb 6, 2013 at 1:15 PM, Zoran Avtarovski <zo...@sparecreative.com>wrote:


> Here's some updated observations after a not quite incident (CPU and
> memory spiked but the app is still running):
>
> 1. Yesterday we had a 90% CPU spike at a time where there was absolutely
> no server traffic. Verified through both the HTTP logs and the mod_jk
> logs. The CPU spiked and recovered back to average levels.
> 2. Used memory spiked at 10GB from a pre incident average of 500MB
> throughout 2 busy days without incident
> 3. Used memory has only gone back down to 4GB and is holding at this level
> 4. The Used physical memory went up from 2GB to 14GB and has stayed there
> 5. Garbage collector time spikes to 24.0. I think with JavaMelody it means
> that GC took 24% of  of the CPU??
>
> So I think our issues are related to GC. Is there a way to trigger more
> frequent GC which will hopefully be less resource intensive?
>
> And why have the memory usage levels not recovered?
>
> Z.
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org
>
>
Zoran,

First I would like to recommend the following document for reading:
http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html#cms

It explains the GC in JVM 1.6 including the Concurrent Collector settings
which is the one you are using. The values for the GC in the log file are
the time the particular collector spent for the operation so the 24.0 would
probably mean 24 seconds, which might mean that for 24 seconds your
applications might be unresponsive during the CMS GC.

As explained in the document from the above link the GC has minor and major
collecting phases. During the minor collection the objects are GC'ed from
the so called young generation or promoted in the old (tenured) generation.
More of this objects pile up in the old generation more frequent the major
GC needs to run. The major ones take usually much longer (they have to
clean much bigger space) time than the minor ones but the minor ones have
to run more frequently. Now, what you need to find is what is causing your
problem really? If your application creates lot of new objects then you'll
have lots of minor GC running. More of this objects survive, they get moved
to the old generation space and then you'll have lots of major GC running
as well. The danger here is that you might end up with constantly running
GC which will render your application unusable due to pauses. So basically
badly written application can cause lots of problems, not closing
connections and freeing objects etc etc, and in that case even the best GC
tunning in the world will not help you, your application(s) will eventually
get to halt.

So read the document carefully and decide which user case is best for you.
If you are creating lots of new objects then maybe increasing the minor
space (default new/old ratio is 1:3) can help.

Also paste here the results of the GC logs. The link I provided has some
more useful settings and recommendations for the CMS collector. This
collector stops the application threads twice during the operation so you
need to check those times too.

Cheers & Pozdrav,
Igor

Re: Help in diagnosing server unresponsiveness

Reply via email to