Have you tried running check_meta.rb with --fix ?

On Sat, Jul 2, 2011 at 9:19 AM, Wayne <[email protected]> wrote:

> We are running 0.90.3. We were testing the table export not realizing the
> data goes to the root drive and not HDFS. The export filled the master's
> root partition. The logger had issues and HDFS got corrupted
> ("java.io.IOException:
> Incorrect data format. logVersion is -18 but writables.length is 0"). We
> had
> to run hadoop fsck -move to fix the corrupted hdfs files. Were were able to
> get hdfs running without issues but hbase ended up with the region issues.
>
> We also had another issue making it worse with Ganglia. We had moved the
> Ganglia host to the master server and Ganglia took up so many resources
> that
> it actually caused timeouts talking to the master and most nodes ended up
> shutting down. I guess Ganglia is a pig in terms or resources...
>
> I just tried to manually edit the .META. table removing the remnants of the
> old table but the shell went haywire on me and turned to control
> characters..??...I ended up corrupting the whole thing and had to delete
> all
> tables...we have just not had a good week.
>
> I will add comments to HBASE-3695 in terms of suggestions.
>
> Thanks.
>
> On Fri, Jul 1, 2011 at 4:55 PM, Stack <[email protected]> wrote:
>
> > What version of hbase are you on Wayne?
> >
> > On Fri, Jul 1, 2011 at 8:32 AM, Wayne <[email protected]> wrote:
> > > I ran the hbck command and found 14 inconsistencies. There were files
> in
> > > hdfs not used for region
> >
> > These are usually harmless.  Bad accounting on our part.  Need to plug
> the
> > hole.
> >
> > >, regions with the same start key, a hole in the
> > > region chain, and a missing start region with an empty key.
> >
> > These are pretty serious.
> >
> > How'd the master running out of root partition do this?  I'd be
> > interested to know.
> >
> > > We are not in production so we have the luxury to start again, but the
> > > damage to our confidence is severe. Is there work going on to improve
> > hbck
> > > -fix to actually be able to resolve these types of issues? Do we need
> to
> > > expect to run a production hbase cluster to be able to move around and
> > > rebuild the region definitions and the .META. table by hand? Things
> just
> > got
> > > a lot scarier fast for us, especially since we were hoping to go into
> > > production next month. Running out of disk space on the master's root
> > > partition can bring down the entire cluster? This is scary...
> > >
> >
> > Understood.
> >
> > St.Ack
> >
>

Reply via email to