Also, I should mention that some of the errors Andrew was seeing are related to ZOOKEEPER-344:
I see this kind of stuff: 2009-04-07 17:58:13,344 - WARN [NIOServerCxn.Factory:2181: nioserverc...@417] - Exception causing close of session 0x2208296c38e0000 due to java.io.IOException: Read error and bye bye HRS ephemeral znodes, which triggers (currently) HBASE-1314. This I think is ZOOKEEPER-344 https://issues.apache.org/jira/browse/ZOOKEEPER-344 - Andy On Wed, Apr 8, 2009 at 12:39 PM, Nitay <nit...@gmail.com> wrote: > Hey guys, > > We've recently replaced a few pieces of HBase's cluster management and > coordination with ZooKeeper. One of guys, Andrew Purtell, has a cluster that > he throws a lot of load at. Andrew's cluster was getting a lot of > SessionExpired events which were causing some havoc. After some discussion > on the hbase list and additional testing by Andrew (tweaking things like the > session timeout, quorum size, and GC used), we suspect the problem is that > the Java GC is starving the ZooKeeper hearbeat thread from executing. > > There is a JIRA open on the matter where Joey suggests a solution that has > worked for him: > > https://issues.apache.org/jira/browse/HBASE-1316 > > We wanted to loop you guys in to see if you have any thoughts/suggestions > on the matter. > > Thanks, > -n >