Thanks in advance for any help. I've been quite pleased with Hbase for this current project and until this problem it has worked quite well.
Test cluster setup is CDH3b3 on a 7 nodes: 5 data nodes with 48GB RAM, 8 cores, 4 disks, 2 masters with 8 cores, 2 disks 24GB RAM for master/zookeeper/namenode My hbase.hregion.max.filesize is set to 1GB, ulimit files to 32k and xceivers to 4096, hbase heap is at 8GB. I'm testing out using GZ compression on two tables, each is currently still only one region. My tests runs fine when compression is off so this is definitely related to compression. When I start loading data (via thrift, many clients) it loads great for a while then the region servers slow to crawl. When this happens the two regionservers that are hosting the tables use ~ 110-160% CPU and block writes. One regionserver has occasional bursts of activity but mostly is very repetitive, here is a sample of the log: http://pastebin.com/WSc8aZFQ The other active regionserver looks to be continuously compacting: http://pastebin.com/3ifVKaX2 The master log is quite boring with this being repeated: 2011-01-08 00:48:58,419 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.rootScanner scanning meta region {server: 10.56.24.8:60020, regionname: -ROOT-,,0.70236052, startKey: <>} 2011-01-08 00:48:58,424 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.rootScanner scan of 1 row(s) of meta region {server: 10.56.24.8:60020, regionname: -ROOT-,,0.70236052, startKey: <>} complete 2011-01-08 00:48:58,444 INFO org.apache.hadoop.hbase.master.ServerManager: 5 region servers, 0 dead, average load 1.6 2011-01-08 00:49:04,810 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scanning meta region {server: 10.56.24.7:60020, regionname: .META.,,1.1028785192, startKey: <>} 2011-01-08 00:49:04,820 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scan of 6 row(s) of meta region {server: 10.56.24.7:60020, regionname: .META.,,1.1028785192, startKey: <>} complete 2011-01-08 00:49:04,820 INFO org.apache.hadoop.hbase.master.BaseScanner: All 1 .META. region(s) scanned At this point loading slows to a trickle (requests are 0 in the web ui), I can see infrequent bursts of loading but very small amounts. Each table only has one region (and there are only two other tables, each also with only one region). I've compiled and tested the native GZ compression codecs on the nodes and the nodes have plenty of CPU, IO and memory available and no swapping. Any suggestions? Please let me know if you need any other info. thanks! -chris
