More details on what I am seeing: I set the region size back to the default (256MB) and got much better performance with fewer pauses for compaction. I loaded until I hit about 150 total regions in the table I am loading now (30 per regionserver) and the set hbase.hregion.max.filesize back up to 1GB (1073741824 is the actual setting I used), After restarting the cluster I ran another load test. Many many more pauses for compactions that halted the whole cluster and i got roughly 50% of the write speed I had before. Compression was not enabled.
thanks for any help, -chris On Wed, Jan 12, 2011 at 3:46 PM, Chirstopher Tarnas <[email protected]> wrote: > I'm doing a test now w/o any GZ compression enabled and I am seeing the > same pauses in loading... any more ideas? I will try dropping my region size > down to 256 MB next. Currently I cannot get any sustained writing via thrift > for more than a few seconds before it all pauses. > > -chris > > > On Tue, Jan 11, 2011 at 10:18 AM, Chirstopher Tarnas <[email protected]>wrote: > >> Hi Stack, >> >> Thanks for taking a look. I think I caught a regionserver compacting: >> >> http://pastebin.com/y9BQaVeJ >> >> http://pastebin.com/ZMxwEX5j >> >> thanks again, >> -chris >> >> On Mon, Jan 10, 2011 at 1:52 PM, Stack <[email protected]> wrote: >> >>> Odd. Mind thread dumping the regionserver a few times and >>> pastebining it during a compaction so we can see where its spending >>> time? (Your compaction numbers are bad). >>> >>> St.Ack >>> >>> On Fri, Jan 7, 2011 at 11:07 PM, Chris Tarnas <[email protected]> wrote: >>> > Thanks in advance for any help. I've been quite pleased with Hbase for >>> this current project and until this problem it has worked quite well. >>> > >>> > Test cluster setup is CDH3b3 on a 7 nodes: >>> > 5 data nodes with 48GB RAM, 8 cores, 4 disks, >>> > 2 masters with 8 cores, 2 disks 24GB RAM for master/zookeeper/namenode >>> > >>> > My hbase.hregion.max.filesize is set to 1GB, ulimit files to 32k and >>> xceivers to 4096, hbase heap is at 8GB. >>> > >>> > I'm testing out using GZ compression on two tables, each is currently >>> still only one region. My tests runs fine when compression is off so this is >>> definitely related to compression. When I start loading data (via thrift, >>> many clients) it loads great for a while then the region servers slow to >>> crawl. When this happens the two regionservers that are hosting the tables >>> use ~ 110-160% CPU and block writes. One regionserver has occasional bursts >>> of activity but mostly is very repetitive, here is a sample of the log: >>> > >>> > http://pastebin.com/WSc8aZFQ >>> > >>> > The other active regionserver looks to be continuously compacting: >>> > >>> > http://pastebin.com/3ifVKaX2 >>> > >>> > >>> > The master log is quite boring with this being repeated: >>> > >>> > 2011-01-08 00:48:58,419 INFO >>> org.apache.hadoop.hbase.master.BaseScanner: RegionManager.rootScanner >>> scanning meta region {server: 10.56.24.8:60020, regionname: >>> -ROOT-,,0.70236052, startKey: <>} >>> > 2011-01-08 00:48:58,424 INFO >>> org.apache.hadoop.hbase.master.BaseScanner: RegionManager.rootScanner scan >>> of 1 row(s) of meta region {server: 10.56.24.8:60020, regionname: >>> -ROOT-,,0.70236052, startKey: <>} complete >>> > 2011-01-08 00:48:58,444 INFO >>> org.apache.hadoop.hbase.master.ServerManager: 5 region servers, 0 dead, >>> average load 1.6 >>> > 2011-01-08 00:49:04,810 INFO >>> org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner >>> scanning meta region {server: 10.56.24.7:60020, regionname: >>> .META.,,1.1028785192, startKey: <>} >>> > 2011-01-08 00:49:04,820 INFO >>> org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scan >>> of 6 row(s) of meta region {server: 10.56.24.7:60020, regionname: >>> .META.,,1.1028785192, startKey: <>} complete >>> > 2011-01-08 00:49:04,820 INFO >>> org.apache.hadoop.hbase.master.BaseScanner: All 1 .META. region(s) scanned >>> > >>> > >>> > At this point loading slows to a trickle (requests are 0 in the web >>> ui), I can see infrequent bursts of loading but very small amounts. Each >>> table only has one region (and there are only two other tables, each also >>> with only one region). >>> > >>> > I've compiled and tested the native GZ compression codecs on the nodes >>> and the nodes have plenty of CPU, IO and memory available and no swapping. >>> Any suggestions? Please let me know if you need any other info. >>> > >>> > thanks! >>> > -chris >>> >> >> >
