More details on what I am seeing:

I set the region size back to the default (256MB) and got much better
performance with fewer pauses for compaction. I loaded until I hit about 150
total regions in the table I am loading now (30 per regionserver) and the
set hbase.hregion.max.filesize back up to 1GB (1073741824 is the actual
setting I used), After restarting the cluster I ran another load test. Many
many more pauses for compactions that halted the whole cluster and i
got roughly 50% of the write speed I had before. Compression was not
enabled.

thanks for any help,
-chris

On Wed, Jan 12, 2011 at 3:46 PM, Chirstopher Tarnas <[email protected]> wrote:

> I'm doing a test now w/o any GZ compression enabled and I am seeing the
> same pauses in loading... any more ideas? I will try dropping my region size
> down to 256 MB next. Currently I cannot get any sustained writing via thrift
> for more than a few seconds before it all pauses.
>
> -chris
>
>
> On Tue, Jan 11, 2011 at 10:18 AM, Chirstopher Tarnas <[email protected]>wrote:
>
>> Hi Stack,
>>
>> Thanks for taking a look. I think I caught a regionserver compacting:
>>
>> http://pastebin.com/y9BQaVeJ
>>
>> http://pastebin.com/ZMxwEX5j
>>
>> thanks again,
>> -chris
>>
>> On Mon, Jan 10, 2011 at 1:52 PM, Stack <[email protected]> wrote:
>>
>>> Odd.   Mind thread dumping the regionserver a few times and
>>> pastebining it during a compaction so we can see where its spending
>>> time?  (Your compaction numbers are bad).
>>>
>>> St.Ack
>>>
>>> On Fri, Jan 7, 2011 at 11:07 PM, Chris Tarnas <[email protected]> wrote:
>>> > Thanks in advance for any help. I've been quite pleased with Hbase for
>>> this current project and until this problem it has worked quite well.
>>> >
>>> > Test cluster setup is CDH3b3 on a 7 nodes:
>>> > 5 data nodes with 48GB RAM, 8 cores, 4 disks,
>>> > 2 masters with 8 cores, 2 disks 24GB RAM for master/zookeeper/namenode
>>> >
>>> > My hbase.hregion.max.filesize is set to 1GB, ulimit files to 32k and
>>> xceivers to 4096, hbase heap is at 8GB.
>>> >
>>> > I'm testing out using GZ compression on two tables, each is currently
>>> still only one region. My tests runs fine when compression is off so this is
>>> definitely related to compression. When I start loading data (via thrift,
>>> many clients) it loads great for a while then the region servers slow to
>>> crawl. When this happens the two regionservers that are hosting the tables
>>> use ~ 110-160% CPU and block writes. One regionserver has occasional bursts
>>> of activity but mostly is very repetitive, here is a sample of the log:
>>> >
>>> > http://pastebin.com/WSc8aZFQ
>>> >
>>> > The other active regionserver looks to be continuously compacting:
>>> >
>>> > http://pastebin.com/3ifVKaX2
>>> >
>>> >
>>> > The master log is quite boring with this being repeated:
>>> >
>>> > 2011-01-08 00:48:58,419 INFO
>>> org.apache.hadoop.hbase.master.BaseScanner: RegionManager.rootScanner
>>> scanning meta region {server: 10.56.24.8:60020, regionname:
>>> -ROOT-,,0.70236052, startKey: <>}
>>> > 2011-01-08 00:48:58,424 INFO
>>> org.apache.hadoop.hbase.master.BaseScanner: RegionManager.rootScanner scan
>>> of 1 row(s) of meta region {server: 10.56.24.8:60020, regionname:
>>> -ROOT-,,0.70236052, startKey: <>} complete
>>> > 2011-01-08 00:48:58,444 INFO
>>> org.apache.hadoop.hbase.master.ServerManager: 5 region servers, 0 dead,
>>> average load 1.6
>>> > 2011-01-08 00:49:04,810 INFO
>>> org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner
>>> scanning meta region {server: 10.56.24.7:60020, regionname:
>>> .META.,,1.1028785192, startKey: <>}
>>> > 2011-01-08 00:49:04,820 INFO
>>> org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scan
>>> of 6 row(s) of meta region {server: 10.56.24.7:60020, regionname:
>>> .META.,,1.1028785192, startKey: <>} complete
>>> > 2011-01-08 00:49:04,820 INFO
>>> org.apache.hadoop.hbase.master.BaseScanner: All 1 .META. region(s) scanned
>>> >
>>> >
>>> > At this point loading slows to a trickle (requests are 0 in the web
>>> ui), I can see infrequent bursts of loading but very small amounts. Each
>>> table only has one region (and there are only two other tables, each also
>>> with only one region).
>>> >
>>> > I've compiled and tested the native GZ compression codecs on the nodes
>>> and the nodes have plenty of CPU, IO and memory available and no swapping.
>>> Any suggestions? Please let me know if you need any other info.
>>> >
>>> > thanks!
>>> > -chris
>>>
>>
>>
>

Reply via email to