Re: Lost regions question

Leonid Fedotov Mon, 15 Apr 2013 11:01:06 -0700

Try to run "habase hbck -fix"
It should do the job.

Thank you!


Sincerely,
Leonid Fedotov

On Apr 12, 2013, at 9:56 AM, Brennon Church wrote:

> hbck does show the hdfs files there without associated regions.  I probably 
> could have recovered had I noticed just after this happened, but given that 
> we've been running like this for over a week, and that there is the potential 
> for collisions between the missing and new data, I'm probably just going to 
> manually reinsert it all using the hdfs files.
> 
> Hadoop version is 1.0.1, btw.
> 
> Thanks.
> 
> --Brennon
> 
> On 4/11/13 11:05 PM, Ted Yu wrote:
>> Brennon:
>> Have you run hbck to diagnose the problem ?
>> 
>> Since the issue might have involved hdfs, browsing DataNode log(s) may
>> provide some clue as well.
>> 
>> What hadoop version are you using ?
>> 
>> Cheers
>> 
>> On Thu, Apr 11, 2013 at 10:58 PM, ramkrishna vasudevan <
>> [email protected]> wrote:
>> 
>>> When you say that the parent regions got reopened does that mean that you
>>> did not lose any data(any data could not be read).  The reason am asking is
>>> if after the parent got split into daughters and the data was written to
>>> daughters and if the daughters related files could not be opened you could
>>> have ended up in not able to read the data.
>>> 
>>> Some logs could tell us what made the parent to get reopened rather than
>>> daughters.  Another thing i would like to ask is was the cluster brought
>>> down abruptly by killing the RS.
>>> 
>>> Which version of HBase?
>>> 
>>> Regards
>>> Ram
>>> 
>>> 
>>> 
>>> 
>>> On Fri, Apr 12, 2013 at 11:20 AM, Brennon Church <[email protected]>
>>> wrote:
>>> 
>>>> Hello,
>>>> 
>>>> I had an interesting problem come up recently.  We have a few thousand
>>>> regions across 8 datanode/regionservers.  I made a change, increasing the
>>>> heap size for hadoop from 128M to 2048M which ended up bringing the
>>> cluster
>>>> to a complete halt after about 1 hour.  I reverted back to 128M and
>>> turned
>>>> things back on again but didn't realize at the time that I came up with 9
>>>> fewer regions than I started.  Upon further investigation, I found that
>>> all
>>>> 9 missing regions were from splits that occurred while the cluster was
>>>> running after making the heap change and before it came to a halt.  There
>>>> was a 10th regions (5 splits involved in total) that managed to get
>>>> recovered.  The really odd thing is that in the case of the other 9
>>>> regions, the original parent regions, which as far as I can tell in the
>>>> logs were deleted, were re-opened upon restarting things once again.  The
>>>> daughter regions were gone.  Interestingly, I found the orphaned
>>> datablocks
>>>> still intact, and in at least some cases have been able to extract the
>>> data
>>>> from them and will hopefully re-add it to the tables.
>>>> 
>>>> My question is this.  Does anyone know based on the rather muddled
>>>> description I've given above, what could have possibly happened here?  My
>>>> best guess is that the bad state that hdfs was in caused some critical
>>>> component of the split process to be missed, which resulted a reference
>>> to
>>>> the parent regions sticking around and losing the references to the
>>>> daughter regions.
>>>> 
>>>> Thanks for any insight you can provide.
>>>> 
>>>> --Brennon
>>>> 
>>>> 
>>>> 
>>>> 
> 
>

Re: Lost regions question

Reply via email to