I changed the settings as described below: hbase.hstore.blockingStoreFiles=20 hbase.hregion.memstore.block.multiplier=4 MAX_FILESIZE=512mb MEMSTORE_FLUSHSIZE=128mb
I also created the table with 6 regions initially. Before I wasn't creating any regions initially. I needed to make all of these changes together to entirely eliminate the very long pauses. Now there are no pauses much longer than a second. Thanks much for the help. I am still not entirely sure why compression seems to expose this problem, however. On Mar 14, 2011, at 11:54 AM, Jean-Daniel Cryans wrote: > Alright so here's a preliminary report: > > - No compression is stable for me too, short pauses. > - LZO gave me no problems either, generally faster than no compression. > - GZ initially gave me weird results, but I quickly saw that I forgot > to copy over the native libs from the hadoop folder so my logs were > full of: > > 2011-03-14 10:20:29,624 INFO org.apache.hadoop.io.compress.CodecPool: > Got brand-new compressor > 2011-03-14 10:20:29,626 INFO org.apache.hadoop.io.compress.CodecPool: > Got brand-new compressor > 2011-03-14 10:20:29,628 INFO org.apache.hadoop.io.compress.CodecPool: > Got brand-new compressor > 2011-03-14 10:20:29,630 INFO org.apache.hadoop.io.compress.CodecPool: > Got brand-new compressor > 2011-03-14 10:20:29,632 INFO org.apache.hadoop.io.compress.CodecPool: > Got brand-new compressor > 2011-03-14 10:20:29,634 INFO org.apache.hadoop.io.compress.CodecPool: > Got brand-new compressor > 2011-03-14 10:20:29,636 INFO org.apache.hadoop.io.compress.CodecPool: > Got brand-new compressor > > I copied the libs over, bounced the region servers, and the > performance was much more stable until a point where I got a 20 > seconds pause, and looking at the logs I see: > > 2011-03-14 10:31:17,625 WARN > org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Region > test,,1300127266461.9d0eb095b77716c22cd5c78bb503c744. has too many > store files; delaying flush up to 90000ms > > (our config sets the block at 20 store files instead of the default > which is around 12 IIRC) > > Quickly followed by a bunch of: > > 2011-03-14 10:31:26,757 INFO > org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for > 'IPC Server handler 20 on 60020' on region > test,,1300127266461.9d0eb095b77716c22cd5c78bb503c744.: memstore size > 285.6m is >= than blocking 256.0m size > > (our settings make it that we won't block on memstores until 4x their > sizes, in your case you may see a 2x blocking factor so 128MB which is > default) > > The reason is that our memstores, once flushed, occupy a very small > space, consider this: > > 2011-03-14 10:31:16,606 INFO > org.apache.hadoop.hbase.regionserver.Store: Added > hdfs://sv2borg169:9000/hbase/test/9d0eb095b77716c22cd5c78bb503c744/test/420552941380451032, > entries=216000, sequenceid=70556635737, memsize=64.3m, filesize=6.0m > > It means that it will create tiny files of ~6MB and the compactor will > spend all it's time merging those files until a point where HBase must > stop inserting in order to not blow its available memory. Thus, the > same data will get rewritten a couple of times. > > Normally, and by that I mean a system where you're not just trying to > insert data ASAP but where most of your workload is made up of reads, > this works well as the memstores are filled much more slowly and > compactions happen at a normal pace. > > If you search around the interwebs for tips on speeding up HBase > inserts, you'll often see the configs I referred to earlier: > > <name>hbase.hstore.blockingStoreFiles</name> > <value>20</value> > and > <name>hbase.hregion.memstore.block.multiplier</name> > <value>4</value> > > They should work pretty well for most use cases that are made of heavy > writes given that the region servers have enough heap (eg more than 3 > or 4GB). You should also consider setting MAX_FILESIZE to >1GB to > limit the number of regions and MEMSTORE_FLUSHSIZE to >128MB to flush > bigger files. > > Hope this helps, > > J-D > > On Mon, Mar 14, 2011 at 10:29 AM, Jean-Daniel Cryans > <jdcry...@apache.org> wrote: >> Thanks for the report Bryan, I'll try your little program against one >> of our 0.90.1 cluster that has similar hardware. >> >> J-D >> >> On Sun, Mar 13, 2011 at 1:48 PM, Bryan Keller <brya...@gmail.com> wrote: >>> If interested, I wrote a small program that demonstrates the problem >>> (http://vancameron.net/HBaseInsert.zip). It uses Gradle, so you'll need >>> that. To run, enter "gradle run". >>> >>> On Mar 13, 2011, at 12:14 AM, Bryan Keller wrote: >>> >>>> I am using the Java client API to write 10,000 rows with about 6000 >>>> columns each, via 8 threads making multiple calls to the >>>> HTable.put(List<Put>) method. I start with an empty table with one column >>>> family and no regions pre-created. >>>> >>>> With compression turned off, I am seeing very stable performance. At the >>>> start there are a couple of 10-20sec pauses where all insert threads are >>>> blocked during a region split. Subsequent splits do not cause all of the >>>> threads to block, presumably because there are more regions so no one >>>> region split blocks all inserts. GCs for HBase during the insert is not a >>>> major problem (6k/55sec). >>>> >>>> When using either LZO or gzip compression, however, I am seeing frequent >>>> and long pauses, sometimes around 20 sec but often over 80 seconds in my >>>> test. During these pauses all 8 of the threads writing to HBase are >>>> blocked. The pauses happen throughout the insert process. GCs are higher >>>> in HBase when using compression (60k, 4min), but it doesn't seem enough to >>>> explain these pauses. Overall performance obviously suffers dramatically >>>> as a result (about 2x slower). >>>> >>>> I have tested this in different configurations (single node, 4 nodes) with >>>> the same result. I'm using HBase 0.90.1 (CDH3B4), Sun/Oracle Java >>>> 1.6.0_24, CentOS 5.5, Hadoop LZO 0.4.10 from Cloudera. Machines have 12 >>>> cores and 24 gb of RAM. Settings are pretty much default, nothing out of >>>> the ordinary. I tried playing around with region handler count and >>>> memstore settings, but these had no effect. >>>> >>> >>> >>