Is this a long GC pause, or something else?

Tom Brown Tue, 10 Jun 2014 11:06:38 -0700

Last night a regionserver in my cluster stopped responding in a timely
manner for about 20 minutes. I know that stop-the-world GC can cause this
type of behavior, but 20 minutes seems excessive.


The server is a 2 core VM with 16GB of RAM, (hbase max heap is 12GB). We
are using the latest java 7 from oracle. HDFS is provided by an Isilon
cluster.

The server workload is read/write: the writing process reads all rows it is
about to write, updates them if they exist, and then writes all the rows
(replacing ones that were updated).

The last messages before the pause were regarding an HLog roll:

DEBUG org.apache.hadoop.hbase.regionserver.LogRoller: HLog roll requested
INFO org.apache.hadoop.hbase.util.FSUtils: FileSystem doesn't support
getDefaultReplication
INFO org.apache.hadoop.hbase.util.FSUtils: FileSystem doesn't support
getDefaultBlockSize

During the next 20 minutes there were a handful of sporadic LruBlockCache
stats messages but nothing else. After 20 minutes, normal operation resumed.

Is 20 minutes for a GC pause expected given the operational load and
machine specs? Could a GC pause include periodic log messages? If it wasn't
a GC pause, what else could it be?

--Tom

Is this a long GC pause, or something else?

Reply via email to