Maybe we can kill the zookeeper connection in the abort handler.
On Fri, May 30, 2014 at 9:38 AM, Buckley,Ron <[email protected]> wrote: > Thanks Ted. I should have seen that. > > I finally had to 'kill -9' the rs, as I couldnt get it to shut down any > other way. > > It seems like, the Region Server shouldnt have kept telling ZooKeeper that > all was well, even though it was trying to abort with a fatal error. > > > -----Original Message----- > From: Ted Yu [mailto:[email protected]] > Sent: Friday, May 30, 2014 12:11 PM > To: [email protected] > Subject: Re: Region Server hung during shutdown after StackOverflow error > > Looking at the StackOverflowError in pastebin, the cause was too many > calls to subList(). > J-D fixed one similar bug in HBASE-10312 > > I searched for '\.subList(' in 0.94 codebase but haven't pinpointed which > class was the source of such calls. > > Will dig deeper when I have time. > > Cheers > > > On Fri, May 30, 2014 at 8:24 AM, Buckley,Ron <[email protected]> wrote: > > > Interesting case happened out dev HBase cluster overnight. (We're > > running HBase 0.94.15 from CDH 4.6.0) > > > > A region server took a StackOverflow error, it looks like during > > during a minor compaction. > > > > The region server is trying to shut down with a Fatal, but is now hung > > during shutdown. > > > > The particularly troublesome thing is that the RS is alive enough to > > keep zookeeper happy. > > > > So, the regions arent moving off, but our apps cant get to them > > because the RS is mostly dead. > > > > I put some of the details on pastebin. > > > > JStack -> http://pastebin.com/hnLtaG54 Outfile -> > > http://pastebin.com/5F1UcGjg Logfile -> http://pastebin.com/TBL1YSZM > > > > > -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
