Re: problem with LZO compressor on write only loads

Friso van Vollenhoven Sat, 08 Jan 2011 01:34:36 -0800

Hey Ryan,
I went back to the older version. Problem is that going to HBase 0.90 requires 
a API change on the compressor side, which forces you to a version newer than 
0.4.6 or so. So I also had to go back to HBase 0.89, which is again not 
compatible with CDH3b3, so I am back on CDH3b2 again. HBase 0.89 is stable for 
us, so this is not at all a problem. But this LZO problem is really in the way 
of our projected upgrade path (my client would like to end up with CDH3 
everything in the end, because of the support options available in case things 
go wrong and the Cloudera administration courses available when new ops people 
are hired).


Cheers,
Friso



On 7 jan 2011, at 22:28, Ryan Rawson wrote:

> Hey,
> 
> Here at SU we continue to use version 0.1.0 of hadoop-gpl-compression.
> I know some of the newer versions had bugs which leaked
> DirectByteBuffer space, which might be what you are running in to.
> 
> Give the older version a shot, there really hasnt been much in the way
> of how LZO works in a while, most of the 'extra' stuff added was to
> support features hbase does not use.
> 
> Good luck!
> 
> -ryan
> 
> ps: http://code.google.com/p/hadoop-gpl-compression/downloads/list
> 
> 
> On Wed, Jan 5, 2011 at 10:26 PM, Friso van Vollenhoven
> <[email protected]> wrote:
>> Thanks Sandy.
>> 
>> Does setting -XX:MaxDirectMemorySize help in triggering GC when you're 
>> reaching that limit? Or does it just OOME before the actual RAM is exhausted 
>> (then you prevent swapping, which is nicer, though)?
>> 
>> I guess LZO is not a solution that fits all, but we do a lot of random reads 
>> and latency can be an issue for us, so I suppose we have to stick with it.
>> 
>> 
>> Friso
>> 
>> 
>> 
>> On 5 jan 2011, at 20:36, Sandy Pratt wrote:
>> 
>>> I was in a similar situation recently, with similar symptoms, and I 
>>> experienced a crash very similar to yours.  I don't have the specifics 
>>> handy at the moment, but I did post to this list about it a few weeks ago.  
>>> My workload is fairly write-heavy.  I write about 10-20 million smallish 
>>> protobuf/xml blobs per day to an HBase cluster of 12 very underpowered 
>>> machines.
>>> 
>>> The suggestions I received were two: 1) update to the latest hadoop-lzo and 
>>> 2) specify a max direct memory size to the JVM (e.g. 
>>> -XX:MaxDirectMemorySize=256m).
>>> 
>>> I took a third route - change my tables back to gz compression for the time 
>>> being while I figure out what to do.  Since then, my memory usage has been 
>>> rock steady, but more importantly my tables are roughly half the size on 
>>> disk that they were with LZO, and there has been no noticeable drop in 
>>> performance (but remember this is a write heavy workload, I'm not trying to 
>>> serve an online workload with low latency or anything like that).  At this 
>>> point, I might not return to LZO.
>>> 
>>> In general, I'm not convinced that "use LZO" is universally good advice for 
>>> all HBase users.  For one thing, I think it assumes that all installations 
>>> are focused towards low latency, which is not always the case (sometimes 
>>> merely good latency is enough and great latency is not needed).  Secondly, 
>>> it assumes some things about where the performance bottleneck lives.   For 
>>> example, LZO performs well in micro-benchmarks, but if you find yourself in 
>>> an IO-bound batch processing situation, you might be better served by a 
>>> higher compression ratio, even if it's more computationally expensive.
>>> 
>>> Sandy
>>> 
>>>> -----Original Message-----
>>>> From: Friso van Vollenhoven [mailto:[email protected]]
>>>> Sent: Tuesday, January 04, 2011 08:00
>>>> To: <[email protected]>
>>>> Subject: Re: problem with LZO compressor on write only loads
>>>> 
>>>> I ran the job again, but with less other processes running on the same
>>>> machine, so with more physical memory available to HBase. This was to see
>>>> whether there was a point where it would stop allocating more buffers.
>>>> When I do this, after many hours, one of the RSes crashed with a OOME. See
>>>> here:
>>>> 
>>>> 2011-01-04 11:32:01,332 FATAL
>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region
>>>> server serverName=w5r1.inrdb.ripe.net,60020,1294091507228,
>>>> load=(requests=6246, regions=258, usedHeap=1790, maxHeap=16000):
>>>> Uncaught exception in service thread regionserver60020.compactor
>>>> java.lang.OutOfMemoryError: Direct buffer memory
>>>>       at java.nio.Bits.reserveMemory(Bits.java:633)
>>>>       at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:98)
>>>>       at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288)
>>>>       at
>>>> com.hadoop.compression.lzo.LzoCompressor.init(LzoCompressor.java:248)
>>>>       at
>>>> com.hadoop.compression.lzo.LzoCompressor.reinit(LzoCompressor.java:207
>>>> )
>>>>       at
>>>> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:
>>>> 105)
>>>>       at
>>>> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:
>>>> 112)
>>>>       at
>>>> org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(C
>>>> ompression.java:200)
>>>>       at
>>>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.getCompressingStream(HFile
>>>> .java:397)
>>>>       at
>>>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.newBlock(HFile.java:383)
>>>>       at
>>>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.checkBlockBoundary(HFile.ja
>>>> va:354)
>>>>       at 
>>>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:536)
>>>>       at 
>>>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:501)
>>>>       at
>>>> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.j
>>>> ava:836)
>>>>       at
>>>> org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:931)
>>>>       at
>>>> org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:732)
>>>>       at
>>>> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.jav
>>>> a:764)
>>>>       at
>>>> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.jav
>>>> a:709)
>>>>       at
>>>> org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(CompactSp
>>>> litThread.java:81)
>>>> 2011-01-04 11:32:01,369 INFO
>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
>>>> request=0.0, regions=258, stores=516, storefiles=186,
>>>> storefileIndexSize=179, memstoreSize=2125, compactionQueueSize=2,
>>>> usedHeap=1797, maxHeap=16000, blockCacheSize=55051488,
>>>> blockCacheFree=6655834912, blockCacheCount=0, blockCacheHitCount=0,
>>>> blockCacheMissCount=2397107, blockCacheEvictedCount=0,
>>>> blockCacheHitRatio=0, blockCacheHitCachingRatio=0
>>>> 
>>>> I am guessing the OS won't allocate any more memory to the process. As you
>>>> can see, the used heap is nowhere near the max heap.
>>>> 
>>>> Also, this happens from the compaction, it seems. I had not considered 
>>>> those
>>>> as a suspect yet. I could try running with a larger compaction threshold 
>>>> and
>>>> blocking store files. Since this is a write only load, it should not be a 
>>>> problem.
>>>> In our normal operation, compactions and splits are quite common, though,
>>>> because we do read-modify-write cycles a lot. Anyone else doing update
>>>> heavy work with LZO?
>>>> 
>>>> 
>>>> Cheers,
>>>> Friso
>>>> 
>>>> 
>>>> On 4 jan 2011, at 01:54, Todd Lipcon wrote:
>>>> 
>>>>> Fishy. Are your cells particularly large? Or have you tuned the HFile
>>>>> block size at all?
>>>>> 
>>>>> -Todd
>>>>> 
>>>>> On Mon, Jan 3, 2011 at 2:15 PM, Friso van Vollenhoven <
>>>>> [email protected]> wrote:
>>>>> 
>>>>>> I tried it, but it doesn't seem to help. The RS processes grow to
>>>>>> 30Gb in minutes after the job started.
>>>>>> 
>>>>>> Any ideas?
>>>>>> 
>>>>>> 
>>>>>> Friso
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On 3 jan 2011, at 19:18, Todd Lipcon wrote:
>>>>>> 
>>>>>>> Hi Friso,
>>>>>>> 
>>>>>>> Which OS are you running? Particularly, which version of glibc?
>>>>>>> 
>>>>>>> Can you try running with the environment variable
>>>> MALLOC_ARENA_MAX=1 set?
>>>>>>> 
>>>>>>> Thanks
>>>>>>> -Todd
>>>>>>> 
>>>>>>> On Mon, Jan 3, 2011 at 8:15 AM, Friso van Vollenhoven <
>>>>>>> [email protected]> wrote:
>>>>>>> 
>>>>>>>> Hi all,
>>>>>>>> 
>>>>>>>> I seem to run into a problem that occurs when using LZO compression
>>>>>>>> on a heavy write only load. I am using 0.90 RC1 and, thus, the LZO
>>>>>>>> compressor code that supports the reinit() method (from Kevin
>>>>>>>> Weil's github,
>>>>>> version
>>>>>>>> 0.4.8). There are some more Hadoop LZO incarnations, so I am
>>>>>>>> pointing my question to this list.
>>>>>>>> 
>>>>>>>> It looks like the compressor uses direct byte buffers to store the
>>>>>> original
>>>>>>>> and compressed bytes in memory, so the native code can work with it
>>>>>> without
>>>>>>>> the JVM having to copy anything around. The direct buffers are
>>>>>>>> possibly reused after a reinit() call, but will often be newly
>>>>>>>> created in the
>>>>>> init()
>>>>>>>> method, because the existing buffer can be the wrong size for reusing.
>>>>>> The
>>>>>>>> latter case will leave the previously used buffers by the
>>>>>>>> compressor instance eligible for garbage collection. I think the
>>>>>>>> problem is that
>>>>>> this
>>>>>>>> collection never occurs (in time), because the GC does not consider
>>>>>>>> it necessary yet. The GC does not know about the native heap and
>>>>>>>> based on
>>>>>> the
>>>>>>>> state of the JVM heap, there is no reason to finalize these objects 
>>>>>>>> yet.
>>>>>>>> However, direct byte buffers are only freed in the finalizer, so
>>>>>>>> the
>>>>>> native
>>>>>>>> heap keeps growing. On write only loads, a full GC will rarely
>>>>>>>> happen, because the max heap will not grow far beyond the mem
>>>>>>>> stores (no block
>>>>>> cache
>>>>>>>> is used). So what happens is that the machine starts using swap
>>>>>>>> before
>>>>>> the
>>>>>>>> GC will ever clean up the direct byte buffers. I am guessing that
>>>>>> without
>>>>>>>> the reinit() support, the buffers were collected earlier because
>>>>>>>> the referring objects would also be collected every now and then or
>>>>>>>> things
>>>>>> would
>>>>>>>> perhaps just never promote to an older generation.
>>>>>>>> 
>>>>>>>> When I do a pmap on a running RS after it has grown to some 40Gb
>>>>>> resident
>>>>>>>> size (with a 16Gb heap), it will show a lot of near 64M anon blocks
>>>>>>>> (presumably native heap). I show this before with the 0.4.6 version
>>>>>>>> of Hadoop LZO, but that was under normal load. After that I went
>>>>>>>> back to a HBase version that does not require the reinit(). Now I
>>>>>>>> am on 0.90 with
>>>>>> the
>>>>>>>> new LZO, but never did a heavy load like this one with that, until
>>>>>> now...
>>>>>>>> 
>>>>>>>> Can anyone with a better understanding of the LZO code confirm that
>>>>>>>> the above could be the case? If so, would it be possible to change
>>>>>>>> the LZO compressor (and decompressor) to use maybe just one fixed
>>>>>>>> size buffer
>>>>>> (they
>>>>>>>> all appear near 64M anyway) or possibly reuse an existing buffer
>>>>>>>> also
>>>>>> when
>>>>>>>> it is not the exact required size but just large enough to make do?
>>>>>> Having
>>>>>>>> short lived direct byte buffers is apparently a discouraged
>>>>>>>> practice. If anyone can provide some pointers on what to look out
>>>>>>>> for, I could invest some time in creating a patch.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Friso
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Todd Lipcon
>>>>>>> Software Engineer, Cloudera
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Todd Lipcon
>>>>> Software Engineer, Cloudera
>>> 
>> 
>>

Re: problem with LZO compressor on write only loads

Reply via email to