We are running 0.90.3. We were testing the table export not realizing the
data goes to the root drive and not HDFS. The export filled the master's
root partition. The logger had issues and HDFS got corrupted
("java.io.IOException:
Incorrect data format. logVersion is -18 but writables.length is 0"). We had
to run hadoop fsck -move to fix the corrupted hdfs files. Were were able to
get hdfs running without issues but hbase ended up with the region issues.We also had another issue making it worse with Ganglia. We had moved the Ganglia host to the master server and Ganglia took up so many resources that it actually caused timeouts talking to the master and most nodes ended up shutting down. I guess Ganglia is a pig in terms or resources... I just tried to manually edit the .META. table removing the remnants of the old table but the shell went haywire on me and turned to control characters..??...I ended up corrupting the whole thing and had to delete all tables...we have just not had a good week. I will add comments to HBASE-3695 in terms of suggestions. Thanks. On Fri, Jul 1, 2011 at 4:55 PM, Stack <[email protected]> wrote: > What version of hbase are you on Wayne? > > On Fri, Jul 1, 2011 at 8:32 AM, Wayne <[email protected]> wrote: > > I ran the hbck command and found 14 inconsistencies. There were files in > > hdfs not used for region > > These are usually harmless. Bad accounting on our part. Need to plug the > hole. > > >, regions with the same start key, a hole in the > > region chain, and a missing start region with an empty key. > > These are pretty serious. > > How'd the master running out of root partition do this? I'd be > interested to know. > > > We are not in production so we have the luxury to start again, but the > > damage to our confidence is severe. Is there work going on to improve > hbck > > -fix to actually be able to resolve these types of issues? Do we need to > > expect to run a production hbase cluster to be able to move around and > > rebuild the region definitions and the .META. table by hand? Things just > got > > a lot scarier fast for us, especially since we were hoping to go into > > production next month. Running out of disk space on the master's root > > partition can bring down the entire cluster? This is scary... > > > > Understood. > > St.Ack >
