Which release are you using ? In 0.98+, there is JvmPauseMonitor.
Cheers On Tue, Jun 10, 2014 at 11:05 AM, Tom Brown <[email protected]> wrote: > Last night a regionserver in my cluster stopped responding in a timely > manner for about 20 minutes. I know that stop-the-world GC can cause this > type of behavior, but 20 minutes seems excessive. > > The server is a 2 core VM with 16GB of RAM, (hbase max heap is 12GB). We > are using the latest java 7 from oracle. HDFS is provided by an Isilon > cluster. > > The server workload is read/write: the writing process reads all rows it is > about to write, updates them if they exist, and then writes all the rows > (replacing ones that were updated). > > The last messages before the pause were regarding an HLog roll: > > DEBUG org.apache.hadoop.hbase.regionserver.LogRoller: HLog roll requested > INFO org.apache.hadoop.hbase.util.FSUtils: FileSystem doesn't support > getDefaultReplication > INFO org.apache.hadoop.hbase.util.FSUtils: FileSystem doesn't support > getDefaultBlockSize > > During the next 20 minutes there were a handful of sporadic LruBlockCache > stats messages but nothing else. After 20 minutes, normal operation > resumed. > > Is 20 minutes for a GC pause expected given the operational load and > machine specs? Could a GC pause include periodic log messages? If it wasn't > a GC pause, what else could it be? > > --Tom >
