Hi, We have been successfully running a cdh3 HBase cluster on c1.xlarge instances for over a month, but we recently started hitting what looks like connectivity issues in the clusters. Zookeeper sessions are expired by the zk server and the region servers throw a YouAreDeadException before crashing.
Could this be imputed to the gc ? Is there anything I can do about it ? I am monitoring the Ganglia metrics but am unsure of their semantics (where can I find it?). I know that running hbase on ec2 is advised against, but we really need to get this working. Thanks, Nicolas. ZooKeeper log: http://pastebin.com/bVjrkRSL RegionServer log: http://pastebin.com/fU81d8hr
