Sure. I posted the code many weeks back for a tool that will repair holes in .mETA.
If you do a check on the list, you should find it. I'll send you the latest code for that. Maybe I made some fixes after I posted the code. Please ping me if I forget. I've used it to repair huge tables (and fixed subtle bugs in the process) so I'm confident it works. No matter what anyone tells me, I know hbase is horribly broken for the use case of doing bulk writes from an mr job. It shits the bed every time you pass a certain scale. For this reason we've completely rewritten our code so that we use bulkloading. It's way more efficient and always work. Please ping me until I send you the code. Otherwise I will forget. Sent from my iPhone On Oct 29, 2011, at 1:39 PM, "Stuart Smith" <[email protected]> wrote: > Hello Geoff, > > I usually don't show up here, since I use CDH, and good form means I should > stay on CDH-users, > But! > I've been seeing the same issues for months: > > - PENDING_CLOSE too long, master tries to reassign - I see an continuous > stream of these. > - WrongRegionExceptions due to overlapping regions & holes in the regions. > > I just spent all day yesterday cribbing off of St.Ack's check_meta.rb script > to write a java program to fix up overlaps & holes in an offline fashion > (hbase down, directly on hdfs), and will start testing next week (cross my > fingers!). > > It seems like the pending close messages can be ignored? > And once I test my tool, and confirm I know a little bit about what I'm > doing, maybe we could share notes? > > Take care, > -stu > > > > ________________________________ > From: Geoff Hendrey <[email protected]> > To: [email protected] > Cc: [email protected] > Sent: Saturday, September 3, 2011 12:11 AM > Subject: RE: PENDING_CLOSE for too long > > "Are you having trouble getting to any of your data out in tables?" > > depends what you mean. We see corruptions from time to time that prevent > us from getting data, one way or another. Today's corruption was regions > with duplicate start and end rows. We fixed that by deleting the > offending regions from HDFS, and running add_table.rb to restore the > meta. The other common corruption is the holes in ".META." that we > repair with a little tool we wrote. We'd love to learn why we see these > corruptions with such regularity (seemingly much higher than others on > the list). > > We will implement timeout you suggest, and see how it goes. > > Thanks, > Geoff > > -----Original Message----- > From: [email protected] [mailto:[email protected]] On Behalf Of > Stack > Sent: Friday, September 02, 2011 10:51 PM > To: [email protected] > Cc: [email protected] > Subject: Re: PENDING_CLOSE for too long > > Are you having trouble getting to any of your data out in tables? > > To get rid of them, try restarting your master. > > Before you restart your master, do "HBASE-4126 Make timeoutmonitor > timeout after 30 minutes instead of 3"; i.e. set > "hbase.master.assignment.timeoutmonitor.timeout" to 1800000 in > hbase-site.xml. > > St.Ack > > On Fri, Sep 2, 2011 at 1:40 PM, Geoff Hendrey <[email protected]> > wrote: > > In the master logs, I am seeing "regions in transition timed out" and > > "region has been PENDING_CLOSE for too long, running forced unasign". > > Both of these log messages occur at INFO level, so I assume they are > > innocuous. Should I be concerned? > > > > > > > > -geoff > > > >
