RE: gc pause killing regionserver

Sandy Pratt Tue, 06 Mar 2012 10:57:40 -0800

> > Why use the following in your config?
> I use these GC tuning options because I found them somewere on the
> mailing list advertised as generally advised GC options. I think it would be 
> nice
> if HBase ref guide recommends default GC settings, I can imagine that they
> are different for different heap sizes.
 [Sandy Pratt]



Fair enough.  IIRC hbase-env.sh comes with "-ea -XX:+UseConcMarkSweepGC 
-XX:+CMSIncrementalMode" out of the box so I took that to be the default.  It's 
my understanding that args like the "max GC pause" one are hints to the 
ergonomics engine more than anything else.  I believe the intention is that you 
can specify performance characteristics and let the JVM work out generation 
ratios and such[1].  There's nothing wrong with using them, just was curious 
the reasoning behind them.

1: http://docs.oracle.com/javase/1.5.0/docs/guide/vm/gc-ergonomics.html


> 
> > It's possible that you actually are in swap due to uncollected
> > off-heap memory allocations.  I doubt that even severe fragmentation
> > on the heap would cause that kind of slowdown
> Munin does show some minor swapping (and memory overcommit), but
> considering the amount of free space left (os disk cache) and the fact that
> swapiness is set to zero, I was under the impression that it was harmless.
> On second thought I will deep digger into this.
 
[Sandy Pratt] 
I bring this up because I've had real problems with it, and my initial 
intuition was dead wrong.  My advice is to look at the process size reported by 
top at various points in your execution.  In some situations, it's not uncommon 
to see JVM processes with 2 GB heaps have 6-7 GB virtual sizes and bloated 
resident sizes as well.  That's not necessarily a problem in all cases.  For 
example, if a process has a large virtual size because of a bunch of files 
mapped in read-only mode, that's not a big deal.  I found that in many of my 
boxes, the oversized memory footprint was correlated with crashes due to long 
GC.  My best guess was that the problem was due to direct-allocated byte 
buffers not being cleaned up often enough, and when they finally were, I would 
be effectively GCing in swap, which is a death sentence.

I, and a few other people I've spoken too, have had some success with 
'-XX:+UseParallelOldGC -XX:MaxDirectMemorySize=128m'.  I think the first 
argument is more important than the second.  This goes against the HBase 
defaults, so take it with a grain of salt.

I don't know for sure if you have the same problem, but if you suspect 
something it can't hurt to paste some output from top here.

Which JVM are you running?


> > If I'm reading your log correctly, you have about 2.5 GB of heap, right?
> That's right, 2100100100 bytes to be exact.
> 
> > Is this server has the same load as the other ones?
> Yes. They all run about the same amount of regions and generally have the
> same load. The hardware is (should be) identical.
[Sandy Pratt] 

Maybe it tends to be serving a hotspot region each time.  ISTR there was an 
HBase feature to reassign regions to the same server each time; that could be 
what's happening.

RE: gc pause killing regionserver

Reply via email to