Thanks Ted. I should have seen that. I finally had to 'kill -9' the rs, as I couldnt get it to shut down any other way.
It seems like, the Region Server shouldnt have kept telling ZooKeeper that all was well, even though it was trying to abort with a fatal error. -----Original Message----- From: Ted Yu [mailto:[email protected]] Sent: Friday, May 30, 2014 12:11 PM To: [email protected] Subject: Re: Region Server hung during shutdown after StackOverflow error Looking at the StackOverflowError in pastebin, the cause was too many calls to subList(). J-D fixed one similar bug in HBASE-10312 I searched for '\.subList(' in 0.94 codebase but haven't pinpointed which class was the source of such calls. Will dig deeper when I have time. Cheers On Fri, May 30, 2014 at 8:24 AM, Buckley,Ron <[email protected]> wrote: > Interesting case happened out dev HBase cluster overnight. (We're > running HBase 0.94.15 from CDH 4.6.0) > > A region server took a StackOverflow error, it looks like during > during a minor compaction. > > The region server is trying to shut down with a Fatal, but is now hung > during shutdown. > > The particularly troublesome thing is that the RS is alive enough to > keep zookeeper happy. > > So, the regions arent moving off, but our apps cant get to them > because the RS is mostly dead. > > I put some of the details on pastebin. > > JStack -> http://pastebin.com/hnLtaG54 Outfile -> > http://pastebin.com/5F1UcGjg Logfile -> http://pastebin.com/TBL1YSZM > >
