Thanks in advance for any help. I've been quite pleased with Hbase for this 
current project and until this problem it has worked quite well.

Test cluster setup is CDH3b3 on a 7 nodes: 
5 data nodes with 48GB RAM, 8 cores, 4 disks, 
2 masters with 8 cores, 2 disks 24GB RAM for master/zookeeper/namenode

My hbase.hregion.max.filesize is set to 1GB, ulimit files to 32k and xceivers 
to 4096, hbase heap is at 8GB.

I'm testing out using GZ compression on two tables, each is currently still 
only one region. My tests runs fine when compression is off so this is 
definitely related to compression. When I start loading data (via thrift, many 
clients) it loads great for a while then the region servers slow to crawl. When 
this happens the two regionservers that are hosting the tables use ~ 110-160% 
CPU and block writes. One regionserver has occasional bursts of activity but 
mostly is very repetitive, here is a sample of the log:

http://pastebin.com/WSc8aZFQ

The other active regionserver looks to be continuously compacting:

http://pastebin.com/3ifVKaX2


The master log is quite boring with this being repeated:

2011-01-08 00:48:58,419 INFO org.apache.hadoop.hbase.master.BaseScanner: 
RegionManager.rootScanner scanning meta region {server: 10.56.24.8:60020, 
regionname: -ROOT-,,0.70236052, startKey: <>}
2011-01-08 00:48:58,424 INFO org.apache.hadoop.hbase.master.BaseScanner: 
RegionManager.rootScanner scan of 1 row(s) of meta region {server: 
10.56.24.8:60020, regionname: -ROOT-,,0.70236052, startKey: <>} complete
2011-01-08 00:48:58,444 INFO org.apache.hadoop.hbase.master.ServerManager: 5 
region servers, 0 dead, average load 1.6
2011-01-08 00:49:04,810 INFO org.apache.hadoop.hbase.master.BaseScanner: 
RegionManager.metaScanner scanning meta region {server: 10.56.24.7:60020, 
regionname: .META.,,1.1028785192, startKey: <>}
2011-01-08 00:49:04,820 INFO org.apache.hadoop.hbase.master.BaseScanner: 
RegionManager.metaScanner scan of 6 row(s) of meta region {server: 
10.56.24.7:60020, regionname: .META.,,1.1028785192, startKey: <>} complete
2011-01-08 00:49:04,820 INFO org.apache.hadoop.hbase.master.BaseScanner: All 1 
.META. region(s) scanned


At this point loading slows to a trickle (requests are 0 in the web ui), I can 
see infrequent bursts of loading but very small amounts. Each table only has 
one region (and there are only two other tables, each also with only one 
region).

I've compiled and tested the native GZ compression codecs on the nodes and the 
nodes have plenty of CPU, IO and memory available and no swapping. Any 
suggestions? Please let me know if you need any other info.

thanks!
-chris

Reply via email to