hbck does show the hdfs files there without associated regions. I probably could have recovered had I noticed just after this happened, but given that we've been running like this for over a week, and that there is the potential for collisions between the missing and new data, I'm probably just going to manually reinsert it all using the hdfs files.

Hadoop version is 1.0.1, btw.

Thanks.

--Brennon

On 4/11/13 11:05 PM, Ted Yu wrote:
Brennon:
Have you run hbck to diagnose the problem ?

Since the issue might have involved hdfs, browsing DataNode log(s) may
provide some clue as well.

What hadoop version are you using ?

Cheers

On Thu, Apr 11, 2013 at 10:58 PM, ramkrishna vasudevan <
[email protected]> wrote:

When you say that the parent regions got reopened does that mean that you
did not lose any data(any data could not be read).  The reason am asking is
if after the parent got split into daughters and the data was written to
daughters and if the daughters related files could not be opened you could
have ended up in not able to read the data.

Some logs could tell us what made the parent to get reopened rather than
daughters.  Another thing i would like to ask is was the cluster brought
down abruptly by killing the RS.

Which version of HBase?

Regards
Ram




On Fri, Apr 12, 2013 at 11:20 AM, Brennon Church <[email protected]>
wrote:

Hello,

I had an interesting problem come up recently.  We have a few thousand
regions across 8 datanode/regionservers.  I made a change, increasing the
heap size for hadoop from 128M to 2048M which ended up bringing the
cluster
to a complete halt after about 1 hour.  I reverted back to 128M and
turned
things back on again but didn't realize at the time that I came up with 9
fewer regions than I started.  Upon further investigation, I found that
all
9 missing regions were from splits that occurred while the cluster was
running after making the heap change and before it came to a halt.  There
was a 10th regions (5 splits involved in total) that managed to get
recovered.  The really odd thing is that in the case of the other 9
regions, the original parent regions, which as far as I can tell in the
logs were deleted, were re-opened upon restarting things once again.  The
daughter regions were gone.  Interestingly, I found the orphaned
datablocks
still intact, and in at least some cases have been able to extract the
data
from them and will hopefully re-add it to the tables.

My question is this.  Does anyone know based on the rather muddled
description I've given above, what could have possibly happened here?  My
best guess is that the bad state that hdfs was in caused some critical
component of the split process to be missed, which resulted a reference
to
the parent regions sticking around and losing the references to the
daughter regions.

Thanks for any insight you can provide.

--Brennon






Reply via email to