You could move aside the problematic file to get going again.

That we are dying on EOFE for sure warrants more digging since we're
supposed to handle these on replay.   Will take a look in morning (I
seem to recall fixes around EOFEs in here before we released 0.90.0 --
need to dig them up).

St.Ack

On Wed, Jan 26, 2011 at 6:22 PM, Andy Sautins
<[email protected]> wrote:
>
>   We had a situation that has our HBase database in a bad state right now.  
> We re-started a number of nodes this afternoon and while HBase did keep 
> running at least one of our tables does not seem to be serving all its 
> regions.  What I'm seeing in the log is the below java.io.EOFException 
> stacktrace while trying to read a file in the recovered.edits directory.  I 
> looked around a bit and it seems like this might be related to HBASE-2933 
> which seems to say that if the master dies while trying to split a log it can 
> leave invalid logs in recovered.edits.  That seems possible as it's possible 
> that the master was one of the nodes that was re-started today.
>
>   My question is, if this is indeed the case is there a safe way to recover 
> from this situation where I am getting EOF exceptions applying recover on 
> recovered.edits files?  My understanding is the master splits the logs and 
> places them in the recovered.edits directory. I am wondering if I remove the 
> files under the recovered.edits directory if the master would re-split the 
> log file and recover properly or would I have data loss?
>
>   We are currently running the cloudera distribution of HBase 
> hbase-0.89.20100924.
>
>   Any insights on the best way to recover would be much appreciated.
>
> 22eb51f162.: java.io.EOFException: 
> hdfs://hdnn.dfs.returnpath.net:54310/user/hbase/emailProperties/9171dadec62d81105f0f6022eb51f162/recovered.edits/0000000000012154417,
>  entryStart=4160964, pos=4161536, end=4161536, edit=1306
>        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
>        at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown 
> Source)
>        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown 
> Source)
>        at java.lang.reflect.Constructor.newInstance(Unknown Source)
>        at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.addFileInfoToException(SequenceFileLogReader.java:186)
>        at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:142)
>        at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:126)
>        at 
> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:1842)
>        at 
> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:1817)
>        at 
> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:1776)
>        at 
> org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:342)
>        at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.instantiateRegion(HRegionServer.java:1503)
>        at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:1468)
>        at 
> org.apache.hadoop.hbase.regionserver.HRegionServer$Worker.run(HRegionServer.java:1380)
>        at java.lang.Thread.run(Unknown Source)
>
>
>

Reply via email to