Hi,
We happened to several odd problems with little probability. Both of those
problems come along with some un-wrapped exception in ServerShutdownHandler,
like the following two issues:
1)" java.lang.AssertionError: Could not find target position 5368709120":
2011-07-07 13:28:06,118 ERROR org.apache.hadoop.hbase.executor.EventHandler:
Caught throwable while processing event M_META_SERVER_SHUTDOWN
java.lang.AssertionError: Could not find target position 5368709120
at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.getBlockAt(DFSClient.java:2157)
at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:2246)
at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2441)
at java.io.DataInputStream.read(DataInputStream.java:132)
at java.io.DataInputStream.readFully(DataInputStream.java:178)
at
org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:63)
at
org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:101)
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1984)
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1884)
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1930)
at
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:198)
at
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:172)
at
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.parseHLog(HLogSplitter.java:429)
at
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(HLogSplitter.java:262)
at
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(HLogSplitter.java:188)
at
org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:201)
at
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:114)
at
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:156)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
2)"IOException":
2011-07-09 01:38:31,836 ERROR org.apache.hadoop.hbase.master.MasterFileSystem:
Failed splitting
hdfs://162.2.16.6:9000/hbase/.logs/162-2-6-187,20020,1310107719056
java.io.IOException:
hdfs://162.2.16.6:9000/hbase/.logs/162-2-6-187,20020,1310107719056/162-2-6-187%3A20020.1310143885352,
entryStart=1878997244, pos=1879048192, end=2003890606, edit=80274
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.addFileInfoToException(SequenceFileLogReader.java:244)
at
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:200)
at
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:172)
at
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.parseHLog(HLogSplitter.java:429)
at
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(HLogSplitter.java:262)
at
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(HLogSplitter.java:188)
at
org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:201)
at
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:114)
at
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:156)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
These exceptions could cause the ServerShutdownHandler process unexpect exit.
So if the shutted regionserver carries the -ROOT- or .META. region, they won't
be open again unless restart the cluster.
So should we take a better safeguard in it?
Regards,
Jieshan