Re: problem with LZO compressor on write only loads

Tatsuya Kawano Sat, 08 Jan 2011 04:59:59 -0800

Hi Friso, 

So you found HBase 0.89 on CDH3b2 doesn't have the problem. I wonder what would 
happen if you replace hadoop-core-*.jar in CDH3b3 with the one contained in 
HBase 0.90RC distribution (hadoop-core-0.20-append-r1056497.jar) and then 
rebuild hadoop-lzo against it.


Here is the comment on the LzoCompressor#reinit() method: 

-----------------------------------
// ... this method isn't in vanilla 0.20.2, but is in CDH3b3 and YDH
  public void reinit(Configuration conf) {
-----------------------------------

https://github.com/kevinweil/hadoop-lzo/blob/6cbf4e232d7972c94107600567333a372ea08c0a/src/java/com/hadoop/compression/lzo/LzoCompressor.java#L196


I don't know if hadoop-core-0.20-append-r1056497.jar is a vanilla 0.20.2 or 
more like CDH3b3. Maybe I'm wrong, but if it doesn't call reinit(), you'll have 
a good chance to get a stable HBase 0.90.

Good luck! 

Tatsuya 

--
Tatsuya Kawano (Mr.)
Tokyo, Japan

http://twitter.com/#!/tatsuya6502




On 01/08/2011, at 6:33 PM, Friso van Vollenhoven wrote:

> Hey Ryan,
> I went back to the older version. Problem is that going to HBase 0.90 
> requires a API change on the compressor side, which forces you to a version 
> newer than 0.4.6 or so. So I also had to go back to HBase 0.89, which is 
> again not compatible with CDH3b3, so I am back on CDH3b2 again. HBase 0.89 is 
> stable for us, so this is not at all a problem. But this LZO problem is 
> really in the way of our projected upgrade path (my client would like to end 
> up with CDH3 everything in the end, because of the support options available 
> in case things go wrong and the Cloudera administration courses available 
> when new ops people are hired).
> 
> Cheers,
> Friso
> 
> 
> 
> On 7 jan 2011, at 22:28, Ryan Rawson wrote:
> 
>> Hey,
>> 
>> Here at SU we continue to use version 0.1.0 of hadoop-gpl-compression.
>> I know some of the newer versions had bugs which leaked
>> DirectByteBuffer space, which might be what you are running in to.
>> 
>> Give the older version a shot, there really hasnt been much in the way
>> of how LZO works in a while, most of the 'extra' stuff added was to
>> support features hbase does not use.
>> 
>> Good luck!
>> 
>> -ryan
>> 
>> ps: http://code.google.com/p/hadoop-gpl-compression/downloads/list
>> 
>> 
>> On Wed, Jan 5, 2011 at 10:26 PM, Friso van Vollenhoven
>> <[email protected]> wrote:
>>> Thanks Sandy.
>>> 
>>> Does setting -XX:MaxDirectMemorySize help in triggering GC when you're 
>>> reaching that limit? Or does it just OOME before the actual RAM is 
>>> exhausted (then you prevent swapping, which is nicer, though)?
>>> 
>>> I guess LZO is not a solution that fits all, but we do a lot of random 
>>> reads and latency can be an issue for us, so I suppose we have to stick 
>>> with it.
>>> 
>>> 
>>> Friso
>>> 
>>> 
>>> 
>>> On 5 jan 2011, at 20:36, Sandy Pratt wrote:
>>> 
>>>> I was in a similar situation recently, with similar symptoms, and I 
>>>> experienced a crash very similar to yours.  I don't have the specifics 
>>>> handy at the moment, but I did post to this list about it a few weeks ago. 
>>>>  My workload is fairly write-heavy.  I write about 10-20 million smallish 
>>>> protobuf/xml blobs per day to an HBase cluster of 12 very underpowered 
>>>> machines.
>>>> 
>>>> The suggestions I received were two: 1) update to the latest hadoop-lzo 
>>>> and 2) specify a max direct memory size to the JVM (e.g. 
>>>> -XX:MaxDirectMemorySize=256m).
>>>> 
>>>> I took a third route - change my tables back to gz compression for the 
>>>> time being while I figure out what to do.  Since then, my memory usage has 
>>>> been rock steady, but more importantly my tables are roughly half the size 
>>>> on disk that they were with LZO, and there has been no noticeable drop in 
>>>> performance (but remember this is a write heavy workload, I'm not trying 
>>>> to serve an online workload with low latency or anything like that).  At 
>>>> this point, I might not return to LZO.
>>>> 
>>>> In general, I'm not convinced that "use LZO" is universally good advice 
>>>> for all HBase users.  For one thing, I think it assumes that all 
>>>> installations are focused towards low latency, which is not always the 
>>>> case (sometimes merely good latency is enough and great latency is not 
>>>> needed).  Secondly, it assumes some things about where the performance 
>>>> bottleneck lives.   For example, LZO performs well in micro-benchmarks, 
>>>> but if you find yourself in an IO-bound batch processing situation, you 
>>>> might be better served by a higher compression ratio, even if it's more 
>>>> computationally expensive.
>>>> 
>>>> Sandy
>>>> 
>>>>> -----Original Message-----
>>>>> From: Friso van Vollenhoven [mailto:[email protected]]
>>>>> Sent: Tuesday, January 04, 2011 08:00
>>>>> To: <[email protected]>
>>>>> Subject: Re: problem with LZO compressor on write only loads
>>>>> 
>>>>> I ran the job again, but with less other processes running on the same
>>>>> machine, so with more physical memory available to HBase. This was to see
>>>>> whether there was a point where it would stop allocating more buffers.
>>>>> When I do this, after many hours, one of the RSes crashed with a OOME. See
>>>>> here:
>>>>> 
>>>>> 2011-01-04 11:32:01,332 FATAL
>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region
>>>>> server serverName=w5r1.inrdb.ripe.net,60020,1294091507228,
>>>>> load=(requests=6246, regions=258, usedHeap=1790, maxHeap=16000):
>>>>> Uncaught exception in service thread regionserver60020.compactor
>>>>> java.lang.OutOfMemoryError: Direct buffer memory
>>>>>      at java.nio.Bits.reserveMemory(Bits.java:633)
>>>>>      at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:98)
>>>>>      at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288)
>>>>>      at
>>>>> com.hadoop.compression.lzo.LzoCompressor.init(LzoCompressor.java:248)
>>>>>      at
>>>>> com.hadoop.compression.lzo.LzoCompressor.reinit(LzoCompressor.java:207
>>>>> )
>>>>>      at
>>>>> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:
>>>>> 105)
>>>>>      at
>>>>> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:
>>>>> 112)
>>>>>      at
>>>>> org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(C
>>>>> ompression.java:200)
>>>>>      at
>>>>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.getCompressingStream(HFile
>>>>> .java:397)
>>>>>      at
>>>>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.newBlock(HFile.java:383)
>>>>>      at
>>>>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.checkBlockBoundary(HFile.ja
>>>>> va:354)
>>>>>      at 
>>>>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:536)
>>>>>      at 
>>>>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:501)
>>>>>      at
>>>>> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.j
>>>>> ava:836)
>>>>>      at
>>>>> org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:931)
>>>>>      at
>>>>> org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:732)
>>>>>      at
>>>>> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.jav
>>>>> a:764)
>>>>>      at
>>>>> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.jav
>>>>> a:709)
>>>>>      at
>>>>> org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(CompactSp
>>>>> litThread.java:81)
>>>>> 2011-01-04 11:32:01,369 INFO
>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
>>>>> request=0.0, regions=258, stores=516, storefiles=186,
>>>>> storefileIndexSize=179, memstoreSize=2125, compactionQueueSize=2,
>>>>> usedHeap=1797, maxHeap=16000, blockCacheSize=55051488,
>>>>> blockCacheFree=6655834912, blockCacheCount=0, blockCacheHitCount=0,
>>>>> blockCacheMissCount=2397107, blockCacheEvictedCount=0,
>>>>> blockCacheHitRatio=0, blockCacheHitCachingRatio=0
>>>>> 
>>>>> I am guessing the OS won't allocate any more memory to the process. As you
>>>>> can see, the used heap is nowhere near the max heap.
>>>>> 
>>>>> Also, this happens from the compaction, it seems. I had not considered 
>>>>> those
>>>>> as a suspect yet. I could try running with a larger compaction threshold 
>>>>> and
>>>>> blocking store files. Since this is a write only load, it should not be a 
>>>>> problem.
>>>>> In our normal operation, compactions and splits are quite common, though,
>>>>> because we do read-modify-write cycles a lot. Anyone else doing update
>>>>> heavy work with LZO?
>>>>> 
>>>>> 
>>>>> Cheers,
>>>>> Friso
>>>>> 
>>>>> 
>>>>> On 4 jan 2011, at 01:54, Todd Lipcon wrote:
>>>>> 
>>>>>> Fishy. Are your cells particularly large? Or have you tuned the HFile
>>>>>> block size at all?
>>>>>> 
>>>>>> -Todd
>>>>>> 
>>>>>> On Mon, Jan 3, 2011 at 2:15 PM, Friso van Vollenhoven <
>>>>>> [email protected]> wrote:
>>>>>> 
>>>>>>> I tried it, but it doesn't seem to help. The RS processes grow to
>>>>>>> 30Gb in minutes after the job started.
>>>>>>> 
>>>>>>> Any ideas?
>>>>>>> 
>>>>>>> 
>>>>>>> Friso
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On 3 jan 2011, at 19:18, Todd Lipcon wrote:
>>>>>>> 
>>>>>>>> Hi Friso,
>>>>>>>> 
>>>>>>>> Which OS are you running? Particularly, which version of glibc?
>>>>>>>> 
>>>>>>>> Can you try running with the environment variable
>>>>> MALLOC_ARENA_MAX=1 set?
>>>>>>>> 
>>>>>>>> Thanks
>>>>>>>> -Todd
>>>>>>>> 
>>>>>>>> On Mon, Jan 3, 2011 at 8:15 AM, Friso van Vollenhoven <
>>>>>>>> [email protected]> wrote:
>>>>>>>> 
>>>>>>>>> Hi all,
>>>>>>>>> 
>>>>>>>>> I seem to run into a problem that occurs when using LZO compression
>>>>>>>>> on a heavy write only load. I am using 0.90 RC1 and, thus, the LZO
>>>>>>>>> compressor code that supports the reinit() method (from Kevin
>>>>>>>>> Weil's github,
>>>>>>> version
>>>>>>>>> 0.4.8). There are some more Hadoop LZO incarnations, so I am
>>>>>>>>> pointing my question to this list.
>>>>>>>>> 
>>>>>>>>> It looks like the compressor uses direct byte buffers to store the
>>>>>>> original
>>>>>>>>> and compressed bytes in memory, so the native code can work with it
>>>>>>> without
>>>>>>>>> the JVM having to copy anything around. The direct buffers are
>>>>>>>>> possibly reused after a reinit() call, but will often be newly
>>>>>>>>> created in the
>>>>>>> init()
>>>>>>>>> method, because the existing buffer can be the wrong size for reusing.
>>>>>>> The
>>>>>>>>> latter case will leave the previously used buffers by the
>>>>>>>>> compressor instance eligible for garbage collection. I think the
>>>>>>>>> problem is that
>>>>>>> this
>>>>>>>>> collection never occurs (in time), because the GC does not consider
>>>>>>>>> it necessary yet. The GC does not know about the native heap and
>>>>>>>>> based on
>>>>>>> the
>>>>>>>>> state of the JVM heap, there is no reason to finalize these objects 
>>>>>>>>> yet.
>>>>>>>>> However, direct byte buffers are only freed in the finalizer, so
>>>>>>>>> the
>>>>>>> native
>>>>>>>>> heap keeps growing. On write only loads, a full GC will rarely
>>>>>>>>> happen, because the max heap will not grow far beyond the mem
>>>>>>>>> stores (no block
>>>>>>> cache
>>>>>>>>> is used). So what happens is that the machine starts using swap
>>>>>>>>> before
>>>>>>> the
>>>>>>>>> GC will ever clean up the direct byte buffers. I am guessing that
>>>>>>> without
>>>>>>>>> the reinit() support, the buffers were collected earlier because
>>>>>>>>> the referring objects would also be collected every now and then or
>>>>>>>>> things
>>>>>>> would
>>>>>>>>> perhaps just never promote to an older generation.
>>>>>>>>> 
>>>>>>>>> When I do a pmap on a running RS after it has grown to some 40Gb
>>>>>>> resident
>>>>>>>>> size (with a 16Gb heap), it will show a lot of near 64M anon blocks
>>>>>>>>> (presumably native heap). I show this before with the 0.4.6 version
>>>>>>>>> of Hadoop LZO, but that was under normal load. After that I went
>>>>>>>>> back to a HBase version that does not require the reinit(). Now I
>>>>>>>>> am on 0.90 with
>>>>>>> the
>>>>>>>>> new LZO, but never did a heavy load like this one with that, until
>>>>>>> now...
>>>>>>>>> 
>>>>>>>>> Can anyone with a better understanding of the LZO code confirm that
>>>>>>>>> the above could be the case? If so, would it be possible to change
>>>>>>>>> the LZO compressor (and decompressor) to use maybe just one fixed
>>>>>>>>> size buffer
>>>>>>> (they
>>>>>>>>> all appear near 64M anyway) or possibly reuse an existing buffer
>>>>>>>>> also
>>>>>>> when
>>>>>>>>> it is not the exact required size but just large enough to make do?
>>>>>>> Having
>>>>>>>>> short lived direct byte buffers is apparently a discouraged
>>>>>>>>> practice. If anyone can provide some pointers on what to look out
>>>>>>>>> for, I could invest some time in creating a patch.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Friso
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Todd Lipcon
>>>>>>>> Software Engineer, Cloudera
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Todd Lipcon
>>>>>> Software Engineer, Cloudera
>>>> 
>>> 
>>> 
>

Re: problem with LZO compressor on write only loads

Reply via email to