Re: problem with LZO compressor on write only loads

Friso van Vollenhoven Wed, 05 Jan 2011 22:27:32 -0800

Thanks Sandy.

Does setting -XX:MaxDirectMemorySize help in triggering GC when you're reaching 
that limit? Or does it just OOME before the actual RAM is exhausted (then you 
prevent swapping, which is nicer, though)?


I guess LZO is not a solution that fits all, but we do a lot of random reads 
and latency can be an issue for us, so I suppose we have to stick with it.


Friso



On 5 jan 2011, at 20:36, Sandy Pratt wrote:

> I was in a similar situation recently, with similar symptoms, and I 
> experienced a crash very similar to yours.  I don't have the specifics handy 
> at the moment, but I did post to this list about it a few weeks ago.  My 
> workload is fairly write-heavy.  I write about 10-20 million smallish 
> protobuf/xml blobs per day to an HBase cluster of 12 very underpowered 
> machines.
> 
> The suggestions I received were two: 1) update to the latest hadoop-lzo and 
> 2) specify a max direct memory size to the JVM (e.g. 
> -XX:MaxDirectMemorySize=256m).
> 
> I took a third route - change my tables back to gz compression for the time 
> being while I figure out what to do.  Since then, my memory usage has been 
> rock steady, but more importantly my tables are roughly half the size on disk 
> that they were with LZO, and there has been no noticeable drop in performance 
> (but remember this is a write heavy workload, I'm not trying to serve an 
> online workload with low latency or anything like that).  At this point, I 
> might not return to LZO.
> 
> In general, I'm not convinced that "use LZO" is universally good advice for 
> all HBase users.  For one thing, I think it assumes that all installations 
> are focused towards low latency, which is not always the case (sometimes 
> merely good latency is enough and great latency is not needed).  Secondly, it 
> assumes some things about where the performance bottleneck lives.   For 
> example, LZO performs well in micro-benchmarks, but if you find yourself in 
> an IO-bound batch processing situation, you might be better served by a 
> higher compression ratio, even if it's more computationally expensive.
> 
> Sandy
> 
>> -----Original Message-----
>> From: Friso van Vollenhoven [mailto:[email protected]]
>> Sent: Tuesday, January 04, 2011 08:00
>> To: <[email protected]>
>> Subject: Re: problem with LZO compressor on write only loads
>> 
>> I ran the job again, but with less other processes running on the same
>> machine, so with more physical memory available to HBase. This was to see
>> whether there was a point where it would stop allocating more buffers.
>> When I do this, after many hours, one of the RSes crashed with a OOME. See
>> here:
>> 
>> 2011-01-04 11:32:01,332 FATAL
>> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region
>> server serverName=w5r1.inrdb.ripe.net,60020,1294091507228,
>> load=(requests=6246, regions=258, usedHeap=1790, maxHeap=16000):
>> Uncaught exception in service thread regionserver60020.compactor
>> java.lang.OutOfMemoryError: Direct buffer memory
>>        at java.nio.Bits.reserveMemory(Bits.java:633)
>>        at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:98)
>>        at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288)
>>        at
>> com.hadoop.compression.lzo.LzoCompressor.init(LzoCompressor.java:248)
>>        at
>> com.hadoop.compression.lzo.LzoCompressor.reinit(LzoCompressor.java:207
>> )
>>        at
>> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:
>> 105)
>>        at
>> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:
>> 112)
>>        at
>> org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(C
>> ompression.java:200)
>>        at
>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.getCompressingStream(HFile
>> .java:397)
>>        at
>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.newBlock(HFile.java:383)
>>        at
>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.checkBlockBoundary(HFile.ja
>> va:354)
>>        at 
>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:536)
>>        at 
>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:501)
>>        at
>> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.j
>> ava:836)
>>        at
>> org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:931)
>>        at
>> org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:732)
>>        at
>> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.jav
>> a:764)
>>        at
>> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.jav
>> a:709)
>>        at
>> org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(CompactSp
>> litThread.java:81)
>> 2011-01-04 11:32:01,369 INFO
>> org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
>> request=0.0, regions=258, stores=516, storefiles=186,
>> storefileIndexSize=179, memstoreSize=2125, compactionQueueSize=2,
>> usedHeap=1797, maxHeap=16000, blockCacheSize=55051488,
>> blockCacheFree=6655834912, blockCacheCount=0, blockCacheHitCount=0,
>> blockCacheMissCount=2397107, blockCacheEvictedCount=0,
>> blockCacheHitRatio=0, blockCacheHitCachingRatio=0
>> 
>> I am guessing the OS won't allocate any more memory to the process. As you
>> can see, the used heap is nowhere near the max heap.
>> 
>> Also, this happens from the compaction, it seems. I had not considered those
>> as a suspect yet. I could try running with a larger compaction threshold and
>> blocking store files. Since this is a write only load, it should not be a 
>> problem.
>> In our normal operation, compactions and splits are quite common, though,
>> because we do read-modify-write cycles a lot. Anyone else doing update
>> heavy work with LZO?
>> 
>> 
>> Cheers,
>> Friso
>> 
>> 
>> On 4 jan 2011, at 01:54, Todd Lipcon wrote:
>> 
>>> Fishy. Are your cells particularly large? Or have you tuned the HFile
>>> block size at all?
>>> 
>>> -Todd
>>> 
>>> On Mon, Jan 3, 2011 at 2:15 PM, Friso van Vollenhoven <
>>> [email protected]> wrote:
>>> 
>>>> I tried it, but it doesn't seem to help. The RS processes grow to
>>>> 30Gb in minutes after the job started.
>>>> 
>>>> Any ideas?
>>>> 
>>>> 
>>>> Friso
>>>> 
>>>> 
>>>> 
>>>> On 3 jan 2011, at 19:18, Todd Lipcon wrote:
>>>> 
>>>>> Hi Friso,
>>>>> 
>>>>> Which OS are you running? Particularly, which version of glibc?
>>>>> 
>>>>> Can you try running with the environment variable
>> MALLOC_ARENA_MAX=1 set?
>>>>> 
>>>>> Thanks
>>>>> -Todd
>>>>> 
>>>>> On Mon, Jan 3, 2011 at 8:15 AM, Friso van Vollenhoven <
>>>>> [email protected]> wrote:
>>>>> 
>>>>>> Hi all,
>>>>>> 
>>>>>> I seem to run into a problem that occurs when using LZO compression
>>>>>> on a heavy write only load. I am using 0.90 RC1 and, thus, the LZO
>>>>>> compressor code that supports the reinit() method (from Kevin
>>>>>> Weil's github,
>>>> version
>>>>>> 0.4.8). There are some more Hadoop LZO incarnations, so I am
>>>>>> pointing my question to this list.
>>>>>> 
>>>>>> It looks like the compressor uses direct byte buffers to store the
>>>> original
>>>>>> and compressed bytes in memory, so the native code can work with it
>>>> without
>>>>>> the JVM having to copy anything around. The direct buffers are
>>>>>> possibly reused after a reinit() call, but will often be newly
>>>>>> created in the
>>>> init()
>>>>>> method, because the existing buffer can be the wrong size for reusing.
>>>> The
>>>>>> latter case will leave the previously used buffers by the
>>>>>> compressor instance eligible for garbage collection. I think the
>>>>>> problem is that
>>>> this
>>>>>> collection never occurs (in time), because the GC does not consider
>>>>>> it necessary yet. The GC does not know about the native heap and
>>>>>> based on
>>>> the
>>>>>> state of the JVM heap, there is no reason to finalize these objects yet.
>>>>>> However, direct byte buffers are only freed in the finalizer, so
>>>>>> the
>>>> native
>>>>>> heap keeps growing. On write only loads, a full GC will rarely
>>>>>> happen, because the max heap will not grow far beyond the mem
>>>>>> stores (no block
>>>> cache
>>>>>> is used). So what happens is that the machine starts using swap
>>>>>> before
>>>> the
>>>>>> GC will ever clean up the direct byte buffers. I am guessing that
>>>> without
>>>>>> the reinit() support, the buffers were collected earlier because
>>>>>> the referring objects would also be collected every now and then or
>>>>>> things
>>>> would
>>>>>> perhaps just never promote to an older generation.
>>>>>> 
>>>>>> When I do a pmap on a running RS after it has grown to some 40Gb
>>>> resident
>>>>>> size (with a 16Gb heap), it will show a lot of near 64M anon blocks
>>>>>> (presumably native heap). I show this before with the 0.4.6 version
>>>>>> of Hadoop LZO, but that was under normal load. After that I went
>>>>>> back to a HBase version that does not require the reinit(). Now I
>>>>>> am on 0.90 with
>>>> the
>>>>>> new LZO, but never did a heavy load like this one with that, until
>>>> now...
>>>>>> 
>>>>>> Can anyone with a better understanding of the LZO code confirm that
>>>>>> the above could be the case? If so, would it be possible to change
>>>>>> the LZO compressor (and decompressor) to use maybe just one fixed
>>>>>> size buffer
>>>> (they
>>>>>> all appear near 64M anyway) or possibly reuse an existing buffer
>>>>>> also
>>>> when
>>>>>> it is not the exact required size but just large enough to make do?
>>>> Having
>>>>>> short lived direct byte buffers is apparently a discouraged
>>>>>> practice. If anyone can provide some pointers on what to look out
>>>>>> for, I could invest some time in creating a patch.
>>>>>> 
>>>>>> 
>>>>>> Thanks,
>>>>>> Friso
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Todd Lipcon
>>>>> Software Engineer, Cloudera
>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> Todd Lipcon
>>> Software Engineer, Cloudera
>

Re: problem with LZO compressor on write only loads

Reply via email to