Hi

I am not sure whether this was observed before but I have been running into 
issues with a hung master.

This is on a 700 node cluster with Hbase 0.90.0 and Hadoop 0.20.x.

Every now and then, the master fails to respond (any request throws 
MasterNotRunning Exception)and gets stuck and remain unresponsive. The master 
process is alive. The last line in the master log says "Waiting for split 
writer threads to finish" sometime just after startup (log messages shown 
below). And there are no more log messages after that even after a couple of 
hours. jstack -F on the process throws a DebuggerException on every thread and 
says no deadlocks.

Is there any way else to monitor the master? I didn't observe this in my small 
scale (40-node) tests: Hbase 0.90 worked just fine..

Cheers
Vidhya





2011-01-28 07:35:49,866 INFO org.apache.hadoop.hbase.master.MasterFileSystem: 
Log folder 
hdfs://b3110120.yst.yahoo.net:4600/hbase/.logs/b3110270.yst.yahoo.net,60020,1296199618314
 belongs to an existing region server
2011-01-28 07:35:49,866 INFO org.apache.hadoop.hbase.master.MasterFileSystem: 
Log folder 
hdfs://b3110120.yst.yahoo.net:4600/hbase/.logs/b3110271.yst.yahoo.net,60020,1296183218615
 doesn't belong to a known region server, splitting
2011-01-28 07:35:49,867 INFO 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting 1 hlog(s) in 
hdfs://b3110120.yst.yahoo.net:4600/hbase/.logs/b3110271.yst.yahoo.net,60020,1296183218615
2011-01-28 07:35:49,867 DEBUG 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Writer thread 
Thread[WriterThread-0,5,main]: starting
2011-01-28 07:35:49,867 DEBUG 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Writer thread 
Thread[WriterThread-1,5,main]: starting
2011-01-28 07:35:49,867 DEBUG 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting hlog 1 of 1: 
hdfs://b3110120.yst.yahoo.net:4600/hbase/.logs/b3110271.yst.yahoo.net,60020,1296183218615/b3110271.yst.yahoo.net%3A60020.1296183219266,
 length=0
2011-01-28 07:35:49,867 DEBUG 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Writer thread 
Thread[WriterThread-2,5,main]: starting
2011-01-28 07:35:49,867 WARN org.apache.hadoop.hbase.util.FSUtils: Running on 
HDFS without append enabled may result in data loss
2011-01-28 07:35:49,867 WARN 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: File 
hdfs://b3110120.yst.yahoo.net:4600/hbase/.logs/b3110271.yst.yahoo.net,60020,1296183218615/b3110271.yst.yahoo.net%3A60020.1296183219266
 might be still open, length is 0
2011-01-28 07:35:49,869 WARN 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Could not open 
hdfs://b3110120.yst.yahoo.net:4600/hbase/.logs/b3110271.yst.yahoo.net,60020,1296183218615/b3110271.yst.yahoo.net%3A60020.1296183219266
 for reading. File is emptyjava.io.EOFException
2011-01-28 07:35:49,875 INFO 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Archived processed log 
hdfs://b3110120.yst.yahoo.net:4600/hbase/.logs/b3110271.yst.yahoo.net,60020,1296183218615/b3110271.yst.yahoo.net%3A60020.1296183219266
 to 
hdfs://b3110120.yst.yahoo.net:4600/hbase/.oldlogs/b3110271.yst.yahoo.net%3A60020.1296183219266
2011-01-28 07:35:49,877 INFO 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Waiting for split writer 
threads to finish

Reply via email to