Re: Long client pauses with compression

Jean-Daniel Cryans Mon, 14 Mar 2011 11:55:02 -0700

Alright so here's a preliminary report:

- No compression is stable for me too, short pauses.
- LZO gave me no problems either, generally faster than no compression.
- GZ initially gave me weird results, but I quickly saw that I forgot
to copy over the native libs from the hadoop folder so my logs were
full of:


2011-03-14 10:20:29,624 INFO org.apache.hadoop.io.compress.CodecPool:
Got brand-new compressor
2011-03-14 10:20:29,626 INFO org.apache.hadoop.io.compress.CodecPool:
Got brand-new compressor
2011-03-14 10:20:29,628 INFO org.apache.hadoop.io.compress.CodecPool:
Got brand-new compressor
2011-03-14 10:20:29,630 INFO org.apache.hadoop.io.compress.CodecPool:
Got brand-new compressor
2011-03-14 10:20:29,632 INFO org.apache.hadoop.io.compress.CodecPool:
Got brand-new compressor
2011-03-14 10:20:29,634 INFO org.apache.hadoop.io.compress.CodecPool:
Got brand-new compressor
2011-03-14 10:20:29,636 INFO org.apache.hadoop.io.compress.CodecPool:
Got brand-new compressor

I copied the libs over, bounced the region servers, and the
performance was much more stable until a point where I got a 20
seconds pause, and looking at the logs I see:

2011-03-14 10:31:17,625 WARN
org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Region
test,,1300127266461.9d0eb095b77716c22cd5c78bb503c744. has too many
store files; delaying flush up to 90000ms

(our config sets the block at 20 store files instead of the default
which is around 12 IIRC)

Quickly followed by a bunch of:

2011-03-14 10:31:26,757 INFO
org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for
'IPC Server handler 20 on 60020' on region
test,,1300127266461.9d0eb095b77716c22cd5c78bb503c744.: memstore size
285.6m is >= than blocking 256.0m size

(our settings make it that we won't block on memstores until 4x their
sizes, in your case you may see a 2x blocking factor so 128MB which is
default)

The reason is that our memstores, once flushed, occupy a very small
space, consider this:

2011-03-14 10:31:16,606 INFO
org.apache.hadoop.hbase.regionserver.Store: Added
hdfs://sv2borg169:9000/hbase/test/9d0eb095b77716c22cd5c78bb503c744/test/420552941380451032,
entries=216000, sequenceid=70556635737, memsize=64.3m, filesize=6.0m

It means that it will create tiny files of ~6MB and the compactor will
spend all it's time merging those files until a point where HBase must
stop inserting in order to not blow its available memory. Thus, the
same data will get rewritten a couple of times.

Normally, and by that I mean a system where you're not just trying to
insert data ASAP but where most of your workload is made up of reads,
this works well as the memstores are filled much more slowly and
compactions happen at a normal pace.

If you search around the interwebs for tips on speeding up HBase
inserts, you'll often see the configs I referred to earlier:

  <name>hbase.hstore.blockingStoreFiles</name>
  <value>20</value>
and
  <name>hbase.hregion.memstore.block.multiplier</name>
  <value>4</value>

They should work pretty well for most use cases that are made of heavy
writes given that the region servers have enough heap (eg more than 3
or 4GB). You should also consider setting MAX_FILESIZE to >1GB to
limit the number of regions and MEMSTORE_FLUSHSIZE to >128MB to flush
bigger files.

Hope this helps,

J-D

On Mon, Mar 14, 2011 at 10:29 AM, Jean-Daniel Cryans
<jdcry...@apache.org> wrote:
> Thanks for the report Bryan, I'll try your little program against one
> of our 0.90.1 cluster that has similar hardware.
>
> J-D
>
> On Sun, Mar 13, 2011 at 1:48 PM, Bryan Keller <brya...@gmail.com> wrote:
>> If interested, I wrote a small program that demonstrates the problem 
>> (http://vancameron.net/HBaseInsert.zip). It uses Gradle, so you'll need 
>> that. To run, enter "gradle run".
>>
>> On Mar 13, 2011, at 12:14 AM, Bryan Keller wrote:
>>
>>> I am using the Java client API to write 10,000 rows with about 6000 columns 
>>> each, via 8 threads making multiple calls to the HTable.put(List<Put>) 
>>> method. I start with an empty table with one column family and no regions 
>>> pre-created.
>>>
>>> With compression turned off, I am seeing very stable performance. At the 
>>> start there are a couple of 10-20sec  pauses where all insert threads are 
>>> blocked during a region split. Subsequent splits do not cause all of the 
>>> threads to block, presumably because there are more regions so no one 
>>> region split blocks all inserts. GCs for HBase during the insert is not a 
>>> major problem (6k/55sec).
>>>
>>> When using either LZO or gzip compression, however, I am seeing frequent 
>>> and long pauses, sometimes around 20 sec but often over 80 seconds in my 
>>> test. During these pauses all 8 of the threads writing to HBase are 
>>> blocked. The pauses happen throughout the insert process. GCs are higher in 
>>> HBase when using compression (60k, 4min), but it doesn't seem enough to 
>>> explain these pauses. Overall performance obviously suffers dramatically as 
>>> a result (about 2x slower).
>>>
>>> I have tested this in different configurations (single node, 4 nodes) with 
>>> the same result. I'm using HBase 0.90.1 (CDH3B4), Sun/Oracle Java 1.6.0_24, 
>>> CentOS 5.5, Hadoop LZO 0.4.10 from Cloudera. Machines have 12 cores and 24 
>>> gb of RAM. Settings are pretty much default, nothing out of the ordinary. I 
>>> tried playing around with region handler count and memstore settings, but 
>>> these had no effect.
>>>
>>
>>
>

Re: Long client pauses with compression

Reply via email to