We had a situation that has our HBase database in a bad state right now. We
re-started a number of nodes this afternoon and while HBase did keep running at
least one of our tables does not seem to be serving all its regions. What I'm
seeing in the log is the below java.io.EOFException stacktrace while trying to
read a file in the recovered.edits directory. I looked around a bit and it
seems like this might be related to HBASE-2933 which seems to say that if the
master dies while trying to split a log it can leave invalid logs in
recovered.edits. That seems possible as it's possible that the master was one
of the nodes that was re-started today.
My question is, if this is indeed the case is there a safe way to recover
from this situation where I am getting EOF exceptions applying recover on
recovered.edits files? My understanding is the master splits the logs and
places them in the recovered.edits directory. I am wondering if I remove the
files under the recovered.edits directory if the master would re-split the log
file and recover properly or would I have data loss?
We are currently running the cloudera distribution of HBase
hbase-0.89.20100924.
Any insights on the best way to recover would be much appreciated.
22eb51f162.: java.io.EOFException:
hdfs://hdnn.dfs.returnpath.net:54310/user/hbase/emailProperties/9171dadec62d81105f0f6022eb51f162/recovered.edits/0000000000012154417,
entryStart=4160964, pos=4161536, end=4161536, edit=1306
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown
Source)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
at
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.addFileInfoToException(SequenceFileLogReader.java:186)
at
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:142)
at
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:126)
at
org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:1842)
at
org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:1817)
at
org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:1776)
at
org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:342)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.instantiateRegion(HRegionServer.java:1503)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:1468)
at
org.apache.hadoop.hbase.regionserver.HRegionServer$Worker.run(HRegionServer.java:1380)
at java.lang.Thread.run(Unknown Source)