Re: Recovering from corrupt blocks in HFile

Jerry He Thu, 19 Mar 2015 13:50:49 -0700

It is ok to delete the hfile in question with hadoop file system command.
No restart of hbase is needed.  You may see some error exceptions if there
are things (user scan, compaction) on the fly.  But it will be ok.


Jerry

On Thu, Mar 19, 2015 at 12:27 PM, Mike Dillon <[email protected]>
wrote:

> So, it turns out that the client has an archived data source that can
> recreate the HBase data in question if needed, so the need for me to
> actually recover this HFile has diminished to the point where it's probably
> not worth investing my time in creating a custom tool to extract the data.
>
> Given that they're willing to lose the data in this region and recreate it
> if necessary, do I simply need to delete the HFile to make HDFS happy or is
> there something I need to do at the HBase level to tell it that data will
> be going away?
>
> Thanks so much everyone for your help on this issue!
>
> -md
>
> On Wed, Mar 18, 2015 at 10:46 PM, Jerry He <[email protected]> wrote:
>
> > From HBase perspective, since we don't have a ready tool, the general
> idea
> > will need you to have access to HBase source code and write your own
> tool.
> > On the high level, the tool will read/scan the KVs from the hfile similar
> > to what the HFile tool does, while opening a HFileWriter to dump the good
> > data until you are not able to do so.
> > Then you will close the HFileWriter with the necessary meta file info.
> > There are APIs in HBase to do so, but they may not be external public
> API.
> >
> > Jerry
> >
> > On Wed, Mar 18, 2015 at 4:27 PM, Mike Dillon <[email protected]>
> > wrote:
> >
> > > I've had a chance to try out Stack's passed along suggestion of
> > > HADOOP_ROOT_LOGGER="TRACE,console"  hdfs dfs -cat and managed to get
> > this:
> > > https://gist.github.com/md5/d42e97ab7a0bd656f09a
> > >
> > > After knowing what to look for, I was able to find the same checksum
> > > failures in the logs during the major compaction failures.
> > >
> > > I'm willing to accept that all the data after that point in the corrupt
> > > block is lost, so any specific advice for how to replace that block
> with
> > a
> > > partial one containing only the good data would be appreciated. I'm
> aware
> > > that there may be other checksum failures in the subsequent blocks as
> > well,
> > > since nothing is currently able to read past the first corruption
> point,
> > > but I'll just have to wash, rinse, and repeat to see how much good data
> > is
> > > left is the file as a whole.
> > >
> > > -md
> > >
> > > On Wed, Mar 18, 2015 at 2:41 PM, Jerry He <[email protected]> wrote:
> > >
> > > > For a 'fix' and 'recover' hfile tool at HBase level,  the relatively
> > easy
> > > > thing we can recover is probably the data (KVs) up to the point when
> we
> > > hit
> > > > the first corruption caused exception.
> > > > After that, it will not be as easy.  For example, if the current key
> > > length
> > > > or value length is bad, there is no way to skip to the next KV.  We
> > will
> > > > probably need to skip the whole current hblock, and go to the next
> > block
> > > > for KVs assuming the hblock index is still good.
> > > >
> > > > HBASE-12949 <https://issues.apache.org/jira/browse/HBASE-12949> does
> > an
> > > > incremental improvement to make sure we do get a corruption caused
> > > > exception so that the scan/read will not go into an infinite loop.
> > > >
> > > > Jerry
> > > >
> > > > On Wed, Mar 18, 2015 at 12:03 PM, Mike Dillon <
> > [email protected]>
> > > > wrote:
> > > >
> > > > > I haven't filed one myself, but I can do so if my investigation
> ends
> > up
> > > > > finding something bug-worthy as opposed to just random failures due
> > to
> > > > > out-of-disk scenarios.
> > > > >
> > > > > Unfortunately, I had to prioritize some other work this morning,
> so I
> > > > > haven't made it back to the bad node yet.
> > > > >
> > > > > I did attempt restarting the datanode to see if I could make hadoop
> > > fsck
> > > > > happy, but that didn't have any noticeable effect. I'm hoping to
> have
> > > > more
> > > > > time this afternoon to investigate the other suggestions from this
> > > > thread.
> > > > >
> > > > > -md
> > > > >
> > > > > On Wed, Mar 18, 2015 at 11:41 AM, Andrew Purtell <
> > [email protected]>
> > > > > wrote:
> > > > >
> > > > > > 
> > > > > > On Tue, Mar 17, 2015 at 9:47 PM, Stack <[email protected]> wrote:
> > > > > > >
> > > > > > > > If it's possible to recover all of the file except
> > > > > > > > a portion of the affected block, that would be OK too.
> > > > > > >
> > > > > > > I actually do not see a 'fix' or 'recover' on the hfile tool.
> We
> > > need
> > > > > to
> > > > > > > add it so you can recover all but the bad block (we should
> figure
> > > how
> > > > > to
> > > > > > > skip the bad section also).
> > > > > >
> > > > > >
> > > > > > I was just getting caught up on this thread and had the same
> > > thought.
> > > > Is
> > > > > > there an issue filed for this?
> > > > > >
> > > > > >
> > > > > > On Tue, Mar 17, 2015 at 9:47 PM, Stack <[email protected]> wrote:
> > > > > >
> > > > > > > On Tue, Mar 17, 2015 at 5:04 PM, Mike Dillon <
> > > > [email protected]
> > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi all-
> > > > > > > >
> > > > > > > > I've got an HFile that's reporting a corrupt block in "hadoop
> > > fsck"
> > > > > and
> > > > > > > was
> > > > > > > > hoping to get some advice on recovering as much data as
> > possible.
> > > > > > > >
> > > > > > > > When I examined the blk-* file on the three data nodes that
> > have
> > > a
> > > > > > > replica
> > > > > > > > of the affected block, I saw that the replicas on two of the
> > > > > datanodes
> > > > > > > had
> > > > > > > > the same SHA-1 checksum and that the replica on the other
> > > datanode
> > > > > was
> > > > > > a
> > > > > > > > truncated version of the replica found on the other nodes (as
> > > > > reported
> > > > > > > by a
> > > > > > > > difference at EOF by "cmp"). The size of the two identical
> > blocks
> > > > is
> > > > > > > > 67108864, the same as most of the other blocks in the file.
> > > > > > > >
> > > > > > > > Given that there were two datanodes with the same data and
> > > another
> > > > > with
> > > > > > > > truncated data, I made a backup of the truncated file and
> > dropped
> > > > the
> > > > > > > > full-length copy of the block in its place directly on the
> data
> > > > > mount,
> > > > > > > > hoping that this would cause HDFS to no longer report the
> file
> > as
> > > > > > > corrupt.
> > > > > > > > Unfortunately, this didn't seem to have any effect.
> > > > > > > >
> > > > > > > >
> > > > > > > That seems like a reasonable thing to do.
> > > > > > >
> > > > > > > Did you restart the DN that was serving this block before you
> ran
> > > > fsck?
> > > > > > > (Fsck asks namenode what blocks are bad; it likely is still
> > > reporting
> > > > > off
> > > > > > > old info).
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > Looking through the Hadoop source code, it looks like there
> is
> > a
> > > > > > > > CorruptReplicasMap internally that tracks which nodes have
> > > > "corrupt"
> > > > > > > copies
> > > > > > > > of a block. In HDFS-6663 <
> > > > > > > https://issues.apache.org/jira/browse/HDFS-6663
> > > > > > > > >,
> > > > > > > > a "-blockId" parameter was added to "hadoop fsck" to allow
> > > dumping
> > > > > the
> > > > > > > > reason that a block ids is considered corrupt, but that
> wasn't
> > > > added
> > > > > > > until
> > > > > > > > Hadoop 2.7.0 and our client is running 2.0.0-cdh4.6.0.
> > > > > > > >
> > > > > > > >
> > > > > > > Good digging.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > I also had a look at running the "HFile" tool on the affected
> > > file
> > > > > (cf.
> > > > > > > > section 9.7.5.2.2 at
> > > > > > http://hbase.apache.org/0.94/book/regions.arch.html
> > > > > > > ).
> > > > > > > > When I did that, I was able to see the data up to the
> corrupted
> > > > block
> > > > > > as
> > > > > > > > far as I could tell, but then it started repeatedly looping
> > back
> > > to
> > > > > the
> > > > > > > > first row and starting over. I believe this is related to the
> > > > > behavior
> > > > > > > > described in
> https://issues.apache.org/jira/browse/HBASE-12949
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > So, your file is 3G and your blocks are 128M?
> > > > > > >
> > > > > > > The dfsclient should just pass over the bad replica and move on
> > to
> > > > the
> > > > > > good
> > > > > > > one so it would seem to indicate all replicas are bad for you.
> > > > > > >
> > > > > > > If you enable DFSClient DEBUG level logging it should report
> > which
> > > > > blocks
> > > > > > > it is reading from. For example, here I am reading the start of
> > the
> > > > > index
> > > > > > > blocks with DFSClient DEBUG enabled but I grep out the
> DFSClient
> > > > > > emissions
> > > > > > > only:
> > > > > > >
> > > > > > > [stack@c2020 ~]$ ./hbase/bin/hbase --config ~/conf_hbase
> > > > > > > org.apache.hadoop.hbase.io.hfile.HFile -h -f
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> /hbase/data/default/tsdb/3f4ea5ea14653cee6006f13c7d06d10b/t/68b00cb158aa4d839f1744639880f362|grep
> > > > > > > DFSClient
> > > > > > > 2015-03-17 21:42:56,950 DEBUG [main] util.ChecksumType:
> > > > > > > org.apache.hadoop.util.PureJavaCrc32 available
> > > > > > > 2015-03-17 21:42:56,952 DEBUG [main] util.ChecksumType:
> > > > > > > org.apache.hadoop.util.PureJavaCrc32C available
> > > > > > > SLF4J: Class path contains multiple SLF4J bindings.
> > > > > > > SLF4J: Found binding in
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> [jar:file:/home/stack/hbase-1.0.1-SNAPSHOT/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > > > > > > SLF4J: Found binding in
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> [jar:file:/home/stack/hadoop-2.7.0-SNAPSHOT/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > > > > > > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings
> for
> > > an
> > > > > > > explanation.
> > > > > > > SLF4J: Actual binding is of type
> > > [org.slf4j.impl.Log4jLoggerFactory]
> > > > > > > 2015-03-17 21:42:58,082 INFO  [main] hfile.CacheConfig:
> > > > > > > CacheConfig:disabled
> > > > > > > 2015-03-17 21:42:58,126 DEBUG [main] hdfs.DFSClient: newInfo =
> > > > > > > LocatedBlocks{
> > > > > > >   fileLength=108633903
> > > > > > >   underConstruction=false
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> blocks=[LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> > > > > > > getBlockSize()=108633903; corrupt=false; offset=0;
> > > > > > > locs=[DatanodeInfoWithStorage[10.20.84.27:50011
> > > > > > > ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> > > > > > > DatanodeInfoWithStorage[10.20.84.31:50011
> > > > > > > ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> > > > > > > DatanodeInfoWithStorage[10.20.84.30:50011
> > > > > > > ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}]
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> lastLocatedBlock=LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> > > > > > > getBlockSize()=108633903; corrupt=false; offset=0;
> > > > > > > locs=[DatanodeInfoWithStorage[10.20.84.30:50011
> > > > > > > ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> > > > > > > DatanodeInfoWithStorage[10.20.84.31:50011
> > > > > > > ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> > > > > > > DatanodeInfoWithStorage[10.20.84.27:50011
> > > > > > > ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}
> > > > > > >   isLastBlockComplete=true}
> > > > > > > 2015-03-17 21:42:58,132 DEBUG [main] hdfs.DFSClient: Connecting
> > to
> > > > > > datanode
> > > > > > > 10.20.84.27:50011
> > > > > > > 2015-03-17 21:42:58,281 DEBUG [main] hdfs.DFSClient: Connecting
> > to
> > > > > > datanode
> > > > > > > 10.20.84.27:50011
> > > > > > > 2015-03-17 21:42:58,375 DEBUG [main] hdfs.DFSClient: newInfo =
> > > > > > > LocatedBlocks{
> > > > > > >   fileLength=108633903
> > > > > > >   underConstruction=false
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> blocks=[LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> > > > > > > getBlockSize()=108633903; corrupt=false; offset=0;
> > > > > > > locs=[DatanodeInfoWithStorage[10.20.84.30:50011
> > > > > > > ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> > > > > > > DatanodeInfoWithStorage[10.20.84.31:50011
> > > > > > > ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> > > > > > > DatanodeInfoWithStorage[10.20.84.27:50011
> > > > > > > ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}]
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> lastLocatedBlock=LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> > > > > > > getBlockSize()=108633903; corrupt=false; offset=0;
> > > > > > > locs=[DatanodeInfoWithStorage[10.20.84.27:50011
> > > > > > > ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> > > > > > > DatanodeInfoWithStorage[10.20.84.31:50011
> > > > > > > ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> > > > > > > DatanodeInfoWithStorage[10.20.84.30:50011
> > > > > > > ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}
> > > > > > >   isLastBlockComplete=true}
> > > > > > > 2015-03-17 21:42:58,376 DEBUG [main] hdfs.DFSClient: Connecting
> > to
> > > > > > datanode
> > > > > > > 10.20.84.30:50011
> > > > > > > 2015-03-17 21:42:58,381 DEBUG [main] hdfs.DFSClient: Connecting
> > to
> > > > > > datanode
> > > > > > > 10.20.84.27:50011
> > > > > > >
> > > > > > > Do you see it reading from 'good' or 'bad' blocks?
> > > > > > >
> > > > > > > I added this line to hbase log4j.properties to enable DFSClient
> > > > DEBUG:
> > > > > > >
> > > > > > > log4j.logger.org.apache.hadoop.hdfs.DFSClient=DEBUG
> > > > > > >
> > > > > > > On HBASE-12949, what exception is coming up?  Dump it in here.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > My goal is to determine whether the block in question is
> > actually
> > > > > > corrupt
> > > > > > > > and, if so, in what way.
> > > > > > >
> > > > > > >
> > > > > > > What happens if you just try to copy the file local or
> elsewhere
> > in
> > > > the
> > > > > > > filesystem using dfs shell. Do you get a pure dfs exception
> > > > unhampered
> > > > > by
> > > > > > > hbaseyness?
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > If it's possible to recover all of the file except
> > > > > > > > a portion of the affected block, that would be OK too.
> > > > > > >
> > > > > > >
> > > > > > > I actually do not see a 'fix' or 'recover' on the hfile tool.
> We
> > > need
> > > > > to
> > > > > > > add it so you can recover all but the bad block (we should
> figure
> > > how
> > > > > to
> > > > > > > skip the bad section also).
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > I just don't want to
> > > > > > > > be in the position of having to lose all 3 gigs of data in
> this
> > > > > > > particular
> > > > > > > > region, given that most of it appears to be intact. I just
> > can't
> > > > find
> > > > > > the
> > > > > > > > right low-level tools to let me determine the diagnose the
> > exact
> > > > > state
> > > > > > > and
> > > > > > > > structure of the block data I have for this file.
> > > > > > > >
> > > > > > > >
> > > > > > > Nod.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > Any help or direction that someone could provide would be
> much
> > > > > > > appreciated.
> > > > > > > > For reference, I'll repeat that our client is running Hadoop
> > > > > > > 2.0.0-cdh4.6.0
> > > > > > > > and add that the HBase version is 0.94.15-cdh4.6.0.
> > > > > > > >
> > > > > > > >
> > > > > > > See if any of the above helps. I'll try and dig up some more
> > tools
> > > in
> > > > > > > meantime.
> > > > > > > St.Ack
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > Thanks!
> > > > > > > >
> > > > > > > > -md
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best regards,
> > > > > >
> > > > > >    - Andy
> > > > > >
> > > > > > Problems worthy of attack prove their worth by hitting back. -
> Piet
> > > > Hein
> > > > > > (via Tom White)
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Recovering from corrupt blocks in HFile

Reply via email to