Stuart - Have you disabled splitting? I believe you can work around the issue of PENDInG_CLOSE by presplitting your table and disabling splitting. Worked for us.
Sent from my iPhone On Oct 29, 2011, at 4:19 PM, "Ted Yu" <[email protected]> wrote: > In 0.92 (to be released in 2 weeks), you can expect improvement in this > regard. > See HBASE-3368. > > Geoff: > Can you publish your tool on HBASE JIRA ? > > Thanks > > On Sat, Oct 29, 2011 at 2:35 PM, Geoff Hendrey <[email protected]> wrote: > > > Sure. I posted the code many weeks back for a tool that will repair holes > > in .mETA. > > > > If you do a check on the list, you should find it. I'll send you the > > latest code for that. Maybe I made some fixes after I posted the code. > > Please ping me if I forget. I've used it to repair huge tables (and fixed > > subtle bugs in the process) so I'm confident it works. > > > > No matter what anyone tells me, I know hbase is horribly broken for the > > use case of doing bulk writes from an mr job. It shits the bed every time > > you pass a certain scale. For this reason we've completely rewritten our > > code so that we use bulkloading. It's way more efficient and always work. > > > > Please ping me until I send you the code. Otherwise I will forget. > > > > Sent from my iPhone > > > > On Oct 29, 2011, at 1:39 PM, "Stuart Smith" <[email protected]> wrote: > > > > > Hello Geoff, > > > > > > I usually don't show up here, since I use CDH, and good form means I > > should stay on CDH-users, > > > But! > > > I've been seeing the same issues for months: > > > > > > - PENDING_CLOSE too long, master tries to reassign - I see an > > continuous stream of these. > > > - WrongRegionExceptions due to overlapping regions & holes in the > > regions. > > > > > > I just spent all day yesterday cribbing off of St.Ack's check_meta.rb > > script to write a java program to fix up overlaps & holes in an offline > > fashion (hbase down, directly on hdfs), and will start testing next week > > (cross my fingers!). > > > > > > It seems like the pending close messages can be ignored? > > > And once I test my tool, and confirm I know a little bit about what I'm > > doing, maybe we could share notes? > > > > > > Take care, > > > -stu > > > > > > > > > > > > ________________________________ > > > From: Geoff Hendrey <[email protected]> > > > To: [email protected] > > > Cc: [email protected] > > > Sent: Saturday, September 3, 2011 12:11 AM > > > Subject: RE: PENDING_CLOSE for too long > > > > > > "Are you having trouble getting to any of your data out in tables?" > > > > > > depends what you mean. We see corruptions from time to time that prevent > > > us from getting data, one way or another. Today's corruption was regions > > > with duplicate start and end rows. We fixed that by deleting the > > > offending regions from HDFS, and running add_table.rb to restore the > > > meta. The other common corruption is the holes in ".META." that we > > > repair with a little tool we wrote. We'd love to learn why we see these > > > corruptions with such regularity (seemingly much higher than others on > > > the list). > > > > > > We will implement timeout you suggest, and see how it goes. > > > > > > Thanks, > > > Geoff > > > > > > -----Original Message----- > > > From: [email protected] [mailto:[email protected]] On Behalf Of > > > Stack > > > Sent: Friday, September 02, 2011 10:51 PM > > > To: [email protected] > > > Cc: [email protected] > > > Subject: Re: PENDING_CLOSE for too long > > > > > > Are you having trouble getting to any of your data out in tables? > > > > > > To get rid of them, try restarting your master. > > > > > > Before you restart your master, do "HBASE-4126 Make timeoutmonitor > > > timeout after 30 minutes instead of 3"; i.e. set > > > "hbase.master.assignment.timeoutmonitor.timeout" to 1800000 in > > > hbase-site.xml. > > > > > > St.Ack > > > > > > On Fri, Sep 2, 2011 at 1:40 PM, Geoff Hendrey <[email protected]> > > > wrote: > > > > In the master logs, I am seeing "regions in transition timed out" and > > > > "region has been PENDING_CLOSE for too long, running forced unasign". > > > > Both of these log messages occur at INFO level, so I assume they are > > > > innocuous. Should I be concerned? > > > > > > > > > > > > > > > > -geoff > > > > > > > > > >
