Re: problem with LZO compressor on write only loads

Ryan Rawson Fri, 07 Jan 2011 13:28:49 -0800

Hey,

Here at SU we continue to use version 0.1.0 of hadoop-gpl-compression.
 I know some of the newer versions had bugs which leaked
DirectByteBuffer space, which might be what you are running in to.


Give the older version a shot, there really hasnt been much in the way
of how LZO works in a while, most of the 'extra' stuff added was to
support features hbase does not use.

Good luck!

-ryan

ps: http://code.google.com/p/hadoop-gpl-compression/downloads/list


On Wed, Jan 5, 2011 at 10:26 PM, Friso van Vollenhoven
<[email protected]> wrote:
> Thanks Sandy.
>
> Does setting -XX:MaxDirectMemorySize help in triggering GC when you're 
> reaching that limit? Or does it just OOME before the actual RAM is exhausted 
> (then you prevent swapping, which is nicer, though)?
>
> I guess LZO is not a solution that fits all, but we do a lot of random reads 
> and latency can be an issue for us, so I suppose we have to stick with it.
>
>
> Friso
>
>
>
> On 5 jan 2011, at 20:36, Sandy Pratt wrote:
>
>> I was in a similar situation recently, with similar symptoms, and I 
>> experienced a crash very similar to yours.  I don't have the specifics handy 
>> at the moment, but I did post to this list about it a few weeks ago.  My 
>> workload is fairly write-heavy.  I write about 10-20 million smallish 
>> protobuf/xml blobs per day to an HBase cluster of 12 very underpowered 
>> machines.
>>
>> The suggestions I received were two: 1) update to the latest hadoop-lzo and 
>> 2) specify a max direct memory size to the JVM (e.g. 
>> -XX:MaxDirectMemorySize=256m).
>>
>> I took a third route - change my tables back to gz compression for the time 
>> being while I figure out what to do.  Since then, my memory usage has been 
>> rock steady, but more importantly my tables are roughly half the size on 
>> disk that they were with LZO, and there has been no noticeable drop in 
>> performance (but remember this is a write heavy workload, I'm not trying to 
>> serve an online workload with low latency or anything like that).  At this 
>> point, I might not return to LZO.
>>
>> In general, I'm not convinced that "use LZO" is universally good advice for 
>> all HBase users.  For one thing, I think it assumes that all installations 
>> are focused towards low latency, which is not always the case (sometimes 
>> merely good latency is enough and great latency is not needed).  Secondly, 
>> it assumes some things about where the performance bottleneck lives.   For 
>> example, LZO performs well in micro-benchmarks, but if you find yourself in 
>> an IO-bound batch processing situation, you might be better served by a 
>> higher compression ratio, even if it's more computationally expensive.
>>
>> Sandy
>>
>>> -----Original Message-----
>>> From: Friso van Vollenhoven [mailto:[email protected]]
>>> Sent: Tuesday, January 04, 2011 08:00
>>> To: <[email protected]>
>>> Subject: Re: problem with LZO compressor on write only loads
>>>
>>> I ran the job again, but with less other processes running on the same
>>> machine, so with more physical memory available to HBase. This was to see
>>> whether there was a point where it would stop allocating more buffers.
>>> When I do this, after many hours, one of the RSes crashed with a OOME. See
>>> here:
>>>
>>> 2011-01-04 11:32:01,332 FATAL
>>> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region
>>> server serverName=w5r1.inrdb.ripe.net,60020,1294091507228,
>>> load=(requests=6246, regions=258, usedHeap=1790, maxHeap=16000):
>>> Uncaught exception in service thread regionserver60020.compactor
>>> java.lang.OutOfMemoryError: Direct buffer memory
>>>        at java.nio.Bits.reserveMemory(Bits.java:633)
>>>        at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:98)
>>>        at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288)
>>>        at
>>> com.hadoop.compression.lzo.LzoCompressor.init(LzoCompressor.java:248)
>>>        at
>>> com.hadoop.compression.lzo.LzoCompressor.reinit(LzoCompressor.java:207
>>> )
>>>        at
>>> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:
>>> 105)
>>>        at
>>> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:
>>> 112)
>>>        at
>>> org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(C
>>> ompression.java:200)
>>>        at
>>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.getCompressingStream(HFile
>>> .java:397)
>>>        at
>>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.newBlock(HFile.java:383)
>>>        at
>>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.checkBlockBoundary(HFile.ja
>>> va:354)
>>>        at 
>>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:536)
>>>        at 
>>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:501)
>>>        at
>>> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.j
>>> ava:836)
>>>        at
>>> org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:931)
>>>        at
>>> org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:732)
>>>        at
>>> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.jav
>>> a:764)
>>>        at
>>> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.jav
>>> a:709)
>>>        at
>>> org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(CompactSp
>>> litThread.java:81)
>>> 2011-01-04 11:32:01,369 INFO
>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
>>> request=0.0, regions=258, stores=516, storefiles=186,
>>> storefileIndexSize=179, memstoreSize=2125, compactionQueueSize=2,
>>> usedHeap=1797, maxHeap=16000, blockCacheSize=55051488,
>>> blockCacheFree=6655834912, blockCacheCount=0, blockCacheHitCount=0,
>>> blockCacheMissCount=2397107, blockCacheEvictedCount=0,
>>> blockCacheHitRatio=0, blockCacheHitCachingRatio=0
>>>
>>> I am guessing the OS won't allocate any more memory to the process. As you
>>> can see, the used heap is nowhere near the max heap.
>>>
>>> Also, this happens from the compaction, it seems. I had not considered those
>>> as a suspect yet. I could try running with a larger compaction threshold and
>>> blocking store files. Since this is a write only load, it should not be a 
>>> problem.
>>> In our normal operation, compactions and splits are quite common, though,
>>> because we do read-modify-write cycles a lot. Anyone else doing update
>>> heavy work with LZO?
>>>
>>>
>>> Cheers,
>>> Friso
>>>
>>>
>>> On 4 jan 2011, at 01:54, Todd Lipcon wrote:
>>>
>>>> Fishy. Are your cells particularly large? Or have you tuned the HFile
>>>> block size at all?
>>>>
>>>> -Todd
>>>>
>>>> On Mon, Jan 3, 2011 at 2:15 PM, Friso van Vollenhoven <
>>>> [email protected]> wrote:
>>>>
>>>>> I tried it, but it doesn't seem to help. The RS processes grow to
>>>>> 30Gb in minutes after the job started.
>>>>>
>>>>> Any ideas?
>>>>>
>>>>>
>>>>> Friso
>>>>>
>>>>>
>>>>>
>>>>> On 3 jan 2011, at 19:18, Todd Lipcon wrote:
>>>>>
>>>>>> Hi Friso,
>>>>>>
>>>>>> Which OS are you running? Particularly, which version of glibc?
>>>>>>
>>>>>> Can you try running with the environment variable
>>> MALLOC_ARENA_MAX=1 set?
>>>>>>
>>>>>> Thanks
>>>>>> -Todd
>>>>>>
>>>>>> On Mon, Jan 3, 2011 at 8:15 AM, Friso van Vollenhoven <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I seem to run into a problem that occurs when using LZO compression
>>>>>>> on a heavy write only load. I am using 0.90 RC1 and, thus, the LZO
>>>>>>> compressor code that supports the reinit() method (from Kevin
>>>>>>> Weil's github,
>>>>> version
>>>>>>> 0.4.8). There are some more Hadoop LZO incarnations, so I am
>>>>>>> pointing my question to this list.
>>>>>>>
>>>>>>> It looks like the compressor uses direct byte buffers to store the
>>>>> original
>>>>>>> and compressed bytes in memory, so the native code can work with it
>>>>> without
>>>>>>> the JVM having to copy anything around. The direct buffers are
>>>>>>> possibly reused after a reinit() call, but will often be newly
>>>>>>> created in the
>>>>> init()
>>>>>>> method, because the existing buffer can be the wrong size for reusing.
>>>>> The
>>>>>>> latter case will leave the previously used buffers by the
>>>>>>> compressor instance eligible for garbage collection. I think the
>>>>>>> problem is that
>>>>> this
>>>>>>> collection never occurs (in time), because the GC does not consider
>>>>>>> it necessary yet. The GC does not know about the native heap and
>>>>>>> based on
>>>>> the
>>>>>>> state of the JVM heap, there is no reason to finalize these objects yet.
>>>>>>> However, direct byte buffers are only freed in the finalizer, so
>>>>>>> the
>>>>> native
>>>>>>> heap keeps growing. On write only loads, a full GC will rarely
>>>>>>> happen, because the max heap will not grow far beyond the mem
>>>>>>> stores (no block
>>>>> cache
>>>>>>> is used). So what happens is that the machine starts using swap
>>>>>>> before
>>>>> the
>>>>>>> GC will ever clean up the direct byte buffers. I am guessing that
>>>>> without
>>>>>>> the reinit() support, the buffers were collected earlier because
>>>>>>> the referring objects would also be collected every now and then or
>>>>>>> things
>>>>> would
>>>>>>> perhaps just never promote to an older generation.
>>>>>>>
>>>>>>> When I do a pmap on a running RS after it has grown to some 40Gb
>>>>> resident
>>>>>>> size (with a 16Gb heap), it will show a lot of near 64M anon blocks
>>>>>>> (presumably native heap). I show this before with the 0.4.6 version
>>>>>>> of Hadoop LZO, but that was under normal load. After that I went
>>>>>>> back to a HBase version that does not require the reinit(). Now I
>>>>>>> am on 0.90 with
>>>>> the
>>>>>>> new LZO, but never did a heavy load like this one with that, until
>>>>> now...
>>>>>>>
>>>>>>> Can anyone with a better understanding of the LZO code confirm that
>>>>>>> the above could be the case? If so, would it be possible to change
>>>>>>> the LZO compressor (and decompressor) to use maybe just one fixed
>>>>>>> size buffer
>>>>> (they
>>>>>>> all appear near 64M anyway) or possibly reuse an existing buffer
>>>>>>> also
>>>>> when
>>>>>>> it is not the exact required size but just large enough to make do?
>>>>> Having
>>>>>>> short lived direct byte buffers is apparently a discouraged
>>>>>>> practice. If anyone can provide some pointers on what to look out
>>>>>>> for, I could invest some time in creating a patch.
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Friso
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Todd Lipcon
>>>>>> Software Engineer, Cloudera
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Todd Lipcon
>>>> Software Engineer, Cloudera
>>
>
>

Re: problem with LZO compressor on write only loads

Reply via email to