On Wed, Feb 6, 2013 at 1:15 PM, Zoran Avtarovski <zo...@sparecreative.com>wrote:
> Here's some updated observations after a not quite incident (CPU and > memory spiked but the app is still running): > > 1. Yesterday we had a 90% CPU spike at a time where there was absolutely > no server traffic. Verified through both the HTTP logs and the mod_jk > logs. The CPU spiked and recovered back to average levels. > 2. Used memory spiked at 10GB from a pre incident average of 500MB > throughout 2 busy days without incident > 3. Used memory has only gone back down to 4GB and is holding at this level > 4. The Used physical memory went up from 2GB to 14GB and has stayed there > 5. Garbage collector time spikes to 24.0. I think with JavaMelody it means > that GC took 24% of of the CPU?? > > So I think our issues are related to GC. Is there a way to trigger more > frequent GC which will hopefully be less resource intensive? > > And why have the memory usage levels not recovered? > > Z. > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > For additional commands, e-mail: users-h...@tomcat.apache.org > > Zoran, First I would like to recommend the following document for reading: http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html#cms It explains the GC in JVM 1.6 including the Concurrent Collector settings which is the one you are using. The values for the GC in the log file are the time the particular collector spent for the operation so the 24.0 would probably mean 24 seconds, which might mean that for 24 seconds your applications might be unresponsive during the CMS GC. As explained in the document from the above link the GC has minor and major collecting phases. During the minor collection the objects are GC'ed from the so called young generation or promoted in the old (tenured) generation. More of this objects pile up in the old generation more frequent the major GC needs to run. The major ones take usually much longer (they have to clean much bigger space) time than the minor ones but the minor ones have to run more frequently. Now, what you need to find is what is causing your problem really? If your application creates lot of new objects then you'll have lots of minor GC running. More of this objects survive, they get moved to the old generation space and then you'll have lots of major GC running as well. The danger here is that you might end up with constantly running GC which will render your application unusable due to pauses. So basically badly written application can cause lots of problems, not closing connections and freeing objects etc etc, and in that case even the best GC tunning in the world will not help you, your application(s) will eventually get to halt. So read the document carefully and decide which user case is best for you. If you are creating lots of new objects then maybe increasing the minor space (default new/old ratio is 1:3) can help. Also paste here the results of the GC logs. The link I provided has some more useful settings and recommendations for the CMS collector. This collector stops the application threads twice during the operation so you need to check those times too. Cheers & Pozdrav, Igor