Re: PENDING_CLOSE for too long

Geoff Hendrey Sat, 29 Oct 2011 14:36:08 -0700

Sure. I posted the code many weeks back for a tool that will repair holes in 
.mETA.


If you do a check on the list, you should find it. I'll send you the latest 
code for that. Maybe I made some fixes after I posted the code. Please ping me 
if I forget. I've used it to repair huge tables  (and fixed subtle bugs in the 
process) so I'm confident it works.

No matter what anyone tells me, I know hbase is horribly broken for the use 
case of doing bulk writes from an mr job. It shits the bed every time you pass 
a certain scale. For this reason we've completely rewritten our code so that we 
use bulkloading. It's way more efficient and always work.

Please ping me until I send you the code. Otherwise I will forget. 

Sent from my iPhone

On Oct 29, 2011, at 1:39 PM, "Stuart Smith" <[email protected]> wrote:

> Hello Geoff,
> 
>   I usually don't show up here, since I use CDH, and good form means I should 
> stay on CDH-users,
> But!
>   I've been seeing the same issues for months:
> 
>  - PENDING_CLOSE too long, master tries to reassign - I see an continuous 
> stream of these.
>  - WrongRegionExceptions due to overlapping regions & holes in the regions.
> 
> I just spent all day yesterday cribbing off of St.Ack's check_meta.rb script 
> to write a java program to fix up overlaps & holes in an offline fashion 
> (hbase down, directly on hdfs), and will start testing next week (cross my 
> fingers!).
> 
> It seems like the pending close messages can be ignored?
> And once I test my tool, and confirm I know a little bit about what I'm 
> doing, maybe we could share notes?
> 
> Take care,
>   -stu
> 
> 
> 
> ________________________________
> From: Geoff Hendrey <[email protected]>
> To: [email protected]
> Cc: [email protected]
> Sent: Saturday, September 3, 2011 12:11 AM
> Subject: RE: PENDING_CLOSE for too long
> 
> "Are you having trouble getting to any of your data out in tables?"
> 
> depends what you mean. We see corruptions from time to time that prevent
> us from getting data, one way or another. Today's corruption was regions
> with duplicate start and end rows. We fixed that by deleting the
> offending regions from HDFS, and running add_table.rb to restore the
> meta. The other common corruption is the holes in ".META." that we
> repair with a little tool we wrote. We'd love to learn why we see these
> corruptions with such regularity (seemingly much higher than others on
> the list).
> 
> We will implement timeout you suggest, and see how it goes.
> 
> Thanks,
> Geoff
> 
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of
> Stack
> Sent: Friday, September 02, 2011 10:51 PM
> To: [email protected]
> Cc: [email protected]
> Subject: Re: PENDING_CLOSE for too long
> 
> Are you having trouble getting to any of your data out in tables?
> 
> To get rid of them, try restarting your master.
> 
> Before you restart your master, do "HBASE-4126  Make timeoutmonitor
> timeout after 30 minutes instead of 3"; i.e. set
> "hbase.master.assignment.timeoutmonitor.timeout" to 1800000 in
> hbase-site.xml.
> 
> St.Ack
> 
> On Fri, Sep 2, 2011 at 1:40 PM, Geoff Hendrey <[email protected]>
> wrote:
> > In the master logs, I am seeing "regions in transition timed out" and
> > "region has been PENDING_CLOSE for too long, running forced unasign".
> > Both of these log messages occur at INFO level, so I assume they are
> > innocuous. Should I be concerned?
> >
> >
> >
> > -geoff
> >
> >

Re: PENDING_CLOSE for too long

Reply via email to