I have not tested GZ compression on a 256mb region size yet. When I start a new round of testing I will, thanks for the idea,
-chris On Mon, Jan 10, 2011 at 12:54 PM, Sandy Pratt <[email protected]> wrote: > Chris, > > I'm curious if this happens when hbase.hregion.max.filesize is set to the > default 256m. Have you tested it? > > Sandy > > > -----Original Message----- > > From: Christopher Tarnas [mailto:[email protected]] On Behalf Of Chris > Tarnas > > Sent: Friday, January 07, 2011 23:07 > > To: [email protected] > > Subject: Strange regionserver behavior with GZ compression > > > > Thanks in advance for any help. I've been quite pleased with Hbase for > this > > current project and until this problem it has worked quite well. > > > > Test cluster setup is CDH3b3 on a 7 nodes: > > 5 data nodes with 48GB RAM, 8 cores, 4 disks, > > 2 masters with 8 cores, 2 disks 24GB RAM for master/zookeeper/namenode > > > > My hbase.hregion.max.filesize is set to 1GB, ulimit files to 32k and > xceivers to > > 4096, hbase heap is at 8GB. > > > > I'm testing out using GZ compression on two tables, each is currently > still only > > one region. My tests runs fine when compression is off so this is > definitely > > related to compression. When I start loading data (via thrift, many > clients) it > > loads great for a while then the region servers slow to crawl. When this > > happens the two regionservers that are hosting the tables use ~ 110-160% > > CPU and block writes. One regionserver has occasional bursts of activity > but > > mostly is very repetitive, here is a sample of the log: > > > > http://pastebin.com/WSc8aZFQ > > > > The other active regionserver looks to be continuously compacting: > > > > http://pastebin.com/3ifVKaX2 > > > > > > The master log is quite boring with this being repeated: > > > > 2011-01-08 00:48:58,419 INFO > > org.apache.hadoop.hbase.master.BaseScanner: > > RegionManager.rootScanner scanning meta region {server: 10.56.24.8:60020 > , > > regionname: -ROOT-,,0.70236052, startKey: <>} > > 2011-01-08 00:48:58,424 INFO > > org.apache.hadoop.hbase.master.BaseScanner: > > RegionManager.rootScanner scan of 1 row(s) of meta region {server: > > 10.56.24.8:60020, regionname: -ROOT-,,0.70236052, startKey: <>} complete > > 2011-01-08 00:48:58,444 INFO > > org.apache.hadoop.hbase.master.ServerManager: 5 region servers, 0 dead, > > average load 1.6 > > 2011-01-08 00:49:04,810 INFO > > org.apache.hadoop.hbase.master.BaseScanner: > > RegionManager.metaScanner scanning meta region {server: > > 10.56.24.7:60020, regionname: .META.,,1.1028785192, startKey: <>} > > 2011-01-08 00:49:04,820 INFO > > org.apache.hadoop.hbase.master.BaseScanner: > > RegionManager.metaScanner scan of 6 row(s) of meta region {server: > > 10.56.24.7:60020, regionname: .META.,,1.1028785192, startKey: <>} > complete > > 2011-01-08 00:49:04,820 INFO > > org.apache.hadoop.hbase.master.BaseScanner: All 1 .META. region(s) > > scanned > > > > > > At this point loading slows to a trickle (requests are 0 in the web ui), > I can see > > infrequent bursts of loading but very small amounts. Each table only has > one > > region (and there are only two other tables, each also with only one > region). > > > > I've compiled and tested the native GZ compression codecs on the nodes > > and the nodes have plenty of CPU, IO and memory available and no > > swapping. Any suggestions? Please let me know if you need any other info. > > > > thanks! > > -chris >
