Brennon: Have you run hbck to diagnose the problem ? Since the issue might have involved hdfs, browsing DataNode log(s) may provide some clue as well.
What hadoop version are you using ? Cheers On Thu, Apr 11, 2013 at 10:58 PM, ramkrishna vasudevan < [email protected]> wrote: > When you say that the parent regions got reopened does that mean that you > did not lose any data(any data could not be read). The reason am asking is > if after the parent got split into daughters and the data was written to > daughters and if the daughters related files could not be opened you could > have ended up in not able to read the data. > > Some logs could tell us what made the parent to get reopened rather than > daughters. Another thing i would like to ask is was the cluster brought > down abruptly by killing the RS. > > Which version of HBase? > > Regards > Ram > > > > > On Fri, Apr 12, 2013 at 11:20 AM, Brennon Church <[email protected]> > wrote: > > > Hello, > > > > I had an interesting problem come up recently. We have a few thousand > > regions across 8 datanode/regionservers. I made a change, increasing the > > heap size for hadoop from 128M to 2048M which ended up bringing the > cluster > > to a complete halt after about 1 hour. I reverted back to 128M and > turned > > things back on again but didn't realize at the time that I came up with 9 > > fewer regions than I started. Upon further investigation, I found that > all > > 9 missing regions were from splits that occurred while the cluster was > > running after making the heap change and before it came to a halt. There > > was a 10th regions (5 splits involved in total) that managed to get > > recovered. The really odd thing is that in the case of the other 9 > > regions, the original parent regions, which as far as I can tell in the > > logs were deleted, were re-opened upon restarting things once again. The > > daughter regions were gone. Interestingly, I found the orphaned > datablocks > > still intact, and in at least some cases have been able to extract the > data > > from them and will hopefully re-add it to the tables. > > > > My question is this. Does anyone know based on the rather muddled > > description I've given above, what could have possibly happened here? My > > best guess is that the bad state that hdfs was in caused some critical > > component of the split process to be missed, which resulted a reference > to > > the parent regions sticking around and losing the references to the > > daughter regions. > > > > Thanks for any insight you can provide. > > > > --Brennon > > > > > > > > >
