Maybe we can kill the zookeeper connection in the abort handler.

On Fri, May 30, 2014 at 9:38 AM, Buckley,Ron <[email protected]> wrote:

> Thanks Ted. I should have seen that.
>
> I finally had to 'kill -9' the rs, as I couldnt get it to shut down any
> other way.
>
> It seems like, the Region Server shouldnt have kept telling ZooKeeper that
> all was well, even though it was trying to abort with a fatal error.
>
>
> -----Original Message-----
> From: Ted Yu [mailto:[email protected]]
> Sent: Friday, May 30, 2014 12:11 PM
> To: [email protected]
> Subject: Re: Region Server hung during shutdown after StackOverflow error
>
> Looking at the StackOverflowError in pastebin, the cause was too many
> calls to subList().
> J-D fixed one similar bug in HBASE-10312
>
> I searched for '\.subList(' in 0.94 codebase but haven't pinpointed which
> class was the source of such calls.
>
> Will dig deeper when I have time.
>
> Cheers
>
>
> On Fri, May 30, 2014 at 8:24 AM, Buckley,Ron <[email protected]> wrote:
>
> > Interesting case happened out dev HBase cluster overnight.  (We're
> > running HBase 0.94.15 from CDH 4.6.0)
> >
> > A region server took a StackOverflow error, it looks like during
> > during a minor compaction.
> >
> > The region server is trying to shut down with a Fatal, but is now hung
> > during shutdown.
> >
> > The particularly troublesome thing is that the RS is alive enough to
> > keep zookeeper happy.
> >
> > So, the regions arent moving off, but our apps cant get to them
> > because the RS is mostly dead.
> >
> > I put some of the details on pastebin.
> >
> > JStack -> http://pastebin.com/hnLtaG54 Outfile ->
> > http://pastebin.com/5F1UcGjg Logfile -> http://pastebin.com/TBL1YSZM
> >
> >
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Reply via email to