Hey, Here at SU we continue to use version 0.1.0 of hadoop-gpl-compression. I know some of the newer versions had bugs which leaked DirectByteBuffer space, which might be what you are running in to.
Give the older version a shot, there really hasnt been much in the way of how LZO works in a while, most of the 'extra' stuff added was to support features hbase does not use. Good luck! -ryan ps: http://code.google.com/p/hadoop-gpl-compression/downloads/list On Wed, Jan 5, 2011 at 10:26 PM, Friso van Vollenhoven <[email protected]> wrote: > Thanks Sandy. > > Does setting -XX:MaxDirectMemorySize help in triggering GC when you're > reaching that limit? Or does it just OOME before the actual RAM is exhausted > (then you prevent swapping, which is nicer, though)? > > I guess LZO is not a solution that fits all, but we do a lot of random reads > and latency can be an issue for us, so I suppose we have to stick with it. > > > Friso > > > > On 5 jan 2011, at 20:36, Sandy Pratt wrote: > >> I was in a similar situation recently, with similar symptoms, and I >> experienced a crash very similar to yours. I don't have the specifics handy >> at the moment, but I did post to this list about it a few weeks ago. My >> workload is fairly write-heavy. I write about 10-20 million smallish >> protobuf/xml blobs per day to an HBase cluster of 12 very underpowered >> machines. >> >> The suggestions I received were two: 1) update to the latest hadoop-lzo and >> 2) specify a max direct memory size to the JVM (e.g. >> -XX:MaxDirectMemorySize=256m). >> >> I took a third route - change my tables back to gz compression for the time >> being while I figure out what to do. Since then, my memory usage has been >> rock steady, but more importantly my tables are roughly half the size on >> disk that they were with LZO, and there has been no noticeable drop in >> performance (but remember this is a write heavy workload, I'm not trying to >> serve an online workload with low latency or anything like that). At this >> point, I might not return to LZO. >> >> In general, I'm not convinced that "use LZO" is universally good advice for >> all HBase users. For one thing, I think it assumes that all installations >> are focused towards low latency, which is not always the case (sometimes >> merely good latency is enough and great latency is not needed). Secondly, >> it assumes some things about where the performance bottleneck lives. For >> example, LZO performs well in micro-benchmarks, but if you find yourself in >> an IO-bound batch processing situation, you might be better served by a >> higher compression ratio, even if it's more computationally expensive. >> >> Sandy >> >>> -----Original Message----- >>> From: Friso van Vollenhoven [mailto:[email protected]] >>> Sent: Tuesday, January 04, 2011 08:00 >>> To: <[email protected]> >>> Subject: Re: problem with LZO compressor on write only loads >>> >>> I ran the job again, but with less other processes running on the same >>> machine, so with more physical memory available to HBase. This was to see >>> whether there was a point where it would stop allocating more buffers. >>> When I do this, after many hours, one of the RSes crashed with a OOME. See >>> here: >>> >>> 2011-01-04 11:32:01,332 FATAL >>> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region >>> server serverName=w5r1.inrdb.ripe.net,60020,1294091507228, >>> load=(requests=6246, regions=258, usedHeap=1790, maxHeap=16000): >>> Uncaught exception in service thread regionserver60020.compactor >>> java.lang.OutOfMemoryError: Direct buffer memory >>> at java.nio.Bits.reserveMemory(Bits.java:633) >>> at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:98) >>> at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288) >>> at >>> com.hadoop.compression.lzo.LzoCompressor.init(LzoCompressor.java:248) >>> at >>> com.hadoop.compression.lzo.LzoCompressor.reinit(LzoCompressor.java:207 >>> ) >>> at >>> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java: >>> 105) >>> at >>> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java: >>> 112) >>> at >>> org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(C >>> ompression.java:200) >>> at >>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.getCompressingStream(HFile >>> .java:397) >>> at >>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.newBlock(HFile.java:383) >>> at >>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.checkBlockBoundary(HFile.ja >>> va:354) >>> at >>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:536) >>> at >>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:501) >>> at >>> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.j >>> ava:836) >>> at >>> org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:931) >>> at >>> org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:732) >>> at >>> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.jav >>> a:764) >>> at >>> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.jav >>> a:709) >>> at >>> org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(CompactSp >>> litThread.java:81) >>> 2011-01-04 11:32:01,369 INFO >>> org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics: >>> request=0.0, regions=258, stores=516, storefiles=186, >>> storefileIndexSize=179, memstoreSize=2125, compactionQueueSize=2, >>> usedHeap=1797, maxHeap=16000, blockCacheSize=55051488, >>> blockCacheFree=6655834912, blockCacheCount=0, blockCacheHitCount=0, >>> blockCacheMissCount=2397107, blockCacheEvictedCount=0, >>> blockCacheHitRatio=0, blockCacheHitCachingRatio=0 >>> >>> I am guessing the OS won't allocate any more memory to the process. As you >>> can see, the used heap is nowhere near the max heap. >>> >>> Also, this happens from the compaction, it seems. I had not considered those >>> as a suspect yet. I could try running with a larger compaction threshold and >>> blocking store files. Since this is a write only load, it should not be a >>> problem. >>> In our normal operation, compactions and splits are quite common, though, >>> because we do read-modify-write cycles a lot. Anyone else doing update >>> heavy work with LZO? >>> >>> >>> Cheers, >>> Friso >>> >>> >>> On 4 jan 2011, at 01:54, Todd Lipcon wrote: >>> >>>> Fishy. Are your cells particularly large? Or have you tuned the HFile >>>> block size at all? >>>> >>>> -Todd >>>> >>>> On Mon, Jan 3, 2011 at 2:15 PM, Friso van Vollenhoven < >>>> [email protected]> wrote: >>>> >>>>> I tried it, but it doesn't seem to help. The RS processes grow to >>>>> 30Gb in minutes after the job started. >>>>> >>>>> Any ideas? >>>>> >>>>> >>>>> Friso >>>>> >>>>> >>>>> >>>>> On 3 jan 2011, at 19:18, Todd Lipcon wrote: >>>>> >>>>>> Hi Friso, >>>>>> >>>>>> Which OS are you running? Particularly, which version of glibc? >>>>>> >>>>>> Can you try running with the environment variable >>> MALLOC_ARENA_MAX=1 set? >>>>>> >>>>>> Thanks >>>>>> -Todd >>>>>> >>>>>> On Mon, Jan 3, 2011 at 8:15 AM, Friso van Vollenhoven < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> I seem to run into a problem that occurs when using LZO compression >>>>>>> on a heavy write only load. I am using 0.90 RC1 and, thus, the LZO >>>>>>> compressor code that supports the reinit() method (from Kevin >>>>>>> Weil's github, >>>>> version >>>>>>> 0.4.8). There are some more Hadoop LZO incarnations, so I am >>>>>>> pointing my question to this list. >>>>>>> >>>>>>> It looks like the compressor uses direct byte buffers to store the >>>>> original >>>>>>> and compressed bytes in memory, so the native code can work with it >>>>> without >>>>>>> the JVM having to copy anything around. The direct buffers are >>>>>>> possibly reused after a reinit() call, but will often be newly >>>>>>> created in the >>>>> init() >>>>>>> method, because the existing buffer can be the wrong size for reusing. >>>>> The >>>>>>> latter case will leave the previously used buffers by the >>>>>>> compressor instance eligible for garbage collection. I think the >>>>>>> problem is that >>>>> this >>>>>>> collection never occurs (in time), because the GC does not consider >>>>>>> it necessary yet. The GC does not know about the native heap and >>>>>>> based on >>>>> the >>>>>>> state of the JVM heap, there is no reason to finalize these objects yet. >>>>>>> However, direct byte buffers are only freed in the finalizer, so >>>>>>> the >>>>> native >>>>>>> heap keeps growing. On write only loads, a full GC will rarely >>>>>>> happen, because the max heap will not grow far beyond the mem >>>>>>> stores (no block >>>>> cache >>>>>>> is used). So what happens is that the machine starts using swap >>>>>>> before >>>>> the >>>>>>> GC will ever clean up the direct byte buffers. I am guessing that >>>>> without >>>>>>> the reinit() support, the buffers were collected earlier because >>>>>>> the referring objects would also be collected every now and then or >>>>>>> things >>>>> would >>>>>>> perhaps just never promote to an older generation. >>>>>>> >>>>>>> When I do a pmap on a running RS after it has grown to some 40Gb >>>>> resident >>>>>>> size (with a 16Gb heap), it will show a lot of near 64M anon blocks >>>>>>> (presumably native heap). I show this before with the 0.4.6 version >>>>>>> of Hadoop LZO, but that was under normal load. After that I went >>>>>>> back to a HBase version that does not require the reinit(). Now I >>>>>>> am on 0.90 with >>>>> the >>>>>>> new LZO, but never did a heavy load like this one with that, until >>>>> now... >>>>>>> >>>>>>> Can anyone with a better understanding of the LZO code confirm that >>>>>>> the above could be the case? If so, would it be possible to change >>>>>>> the LZO compressor (and decompressor) to use maybe just one fixed >>>>>>> size buffer >>>>> (they >>>>>>> all appear near 64M anyway) or possibly reuse an existing buffer >>>>>>> also >>>>> when >>>>>>> it is not the exact required size but just large enough to make do? >>>>> Having >>>>>>> short lived direct byte buffers is apparently a discouraged >>>>>>> practice. If anyone can provide some pointers on what to look out >>>>>>> for, I could invest some time in creating a patch. >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> Friso >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Todd Lipcon >>>>>> Software Engineer, Cloudera >>>>> >>>>> >>>> >>>> >>>> -- >>>> Todd Lipcon >>>> Software Engineer, Cloudera >> > >
