What was the server carrying?  How many regions?  What kinda of
loading was on the cluster?  We should not be OOME'ing.  Do you have
the heap dump lying around (We dump heap on OOME... its named *.hprof
or something.  If you have it, want to put it somewhere for me to pull
it so I can take a look?).  Any chance of a errant big cells?  Lots of
them?  What JVM version?

St.Ack

On Wed, Jan 5, 2011 at 8:10 AM, Wayne <[email protected]> wrote:
> I am still struggling with the JVM. We just had a hard OOM crash of a region
> server after only running for 36 hours. Any help would be greatly
> appreciated. Do we need to restart nodes every 24 hours under load?  GC
> Pauses are something we are trying to plan for, but full out OOM crashes are
> a new problem.
>
> The message below seems to be where it starts going bad. It is followed by
> no less than 63 Concurrent Mode Failure errors over a 16 minute period.
>
> *GC locker: Trying a full collection because scavenge failed*
>
> Lastly here is the end (after the 63 CMF errors).
>
> Heap
>  par new generation   total 1887488K, used 303212K [0x00000005fae00000,
> 0x000000067ae00000, 0x000000067ae00000)
>  eden space 1677824K,  18% used [0x00000005fae00000, 0x000000060d61b078,
> 0x0000000661480000)
>  from space 209664K,   0% used [0x000000066e140000, 0x000000066e140000,
> 0x000000067ae00000)
>  to   space 209664K,   0% used [0x0000000661480000, 0x0000000661480000,
> 0x000000066e140000)
>  concurrent mark-sweep generation total 6291456K, used 2440155K
> [0x000000067ae00000, 0x00000007fae00000, 0x00000007fae00000)
>  concurrent-mark-sweep perm gen total 31704K, used 18999K
> [0x00000007fae00000, 0x00000007fccf6000, 0x0000000800000000)
>
> Here again are our custom settings in case there are some suggestions out
> there. Are we making it worse with these settings? What should we try next?
>
>        -XX:+UseCMSInitiatingOccupancyOnly
>        -XX:CMSInitiatingOccupancyFraction=60
>        -XX:+CMSParallelRemarkEnabled
>        -XX:SurvivorRatio=8
>        -XX:NewRatio=3
>        -XX:MaxTenuringThreshold=1
>
>
> Thanks!
>

Reply via email to