Hey Todd, Just FYI, I have only tried the 0.4.8 LZO version with the G1 collector, not CMS. When I saw the problem with earlier versions I did a run with both G1 and CMS and it looked the same.
I am not sure if it makes a difference, though. My guess is that the problem occurs because the byte buffers created by the compressor objects are being reused a couple of times making them longer lived and promote out of young gen, which then keeps them from being finalized for a long time, which in turn never releases the native allocations. But this is just my hunch. I have not looked into verifying this... Friso On 9 jan 2011, at 03:48, Todd Lipcon wrote: > Hey everyone, > > Just wanted to let you know that I will be looking into this this coming > week - we've marked it as an important thing to investigate prior t our next > beta release. > > Thanks > -Todd > > On Sat, Jan 8, 2011 at 4:59 AM, Tatsuya Kawano <[email protected]>wrote: > >> >> Hi Friso, >> >> So you found HBase 0.89 on CDH3b2 doesn't have the problem. I wonder what >> would happen if you replace hadoop-core-*.jar in CDH3b3 with the one >> contained in HBase 0.90RC distribution >> (hadoop-core-0.20-append-r1056497.jar) and then rebuild hadoop-lzo against >> it. >> >> Here is the comment on the LzoCompressor#reinit() method: >> >> ----------------------------------- >> // ... this method isn't in vanilla 0.20.2, but is in CDH3b3 and YDH >> public void reinit(Configuration conf) { >> ----------------------------------- >> >> >> https://github.com/kevinweil/hadoop-lzo/blob/6cbf4e232d7972c94107600567333a372ea08c0a/src/java/com/hadoop/compression/lzo/LzoCompressor.java#L196 >> >> >> I don't know if hadoop-core-0.20-append-r1056497.jar is a vanilla 0.20.2 or >> more like CDH3b3. Maybe I'm wrong, but if it doesn't call reinit(), you'll >> have a good chance to get a stable HBase 0.90. >> >> Good luck! >> >> Tatsuya >> >> -- >> Tatsuya Kawano (Mr.) >> Tokyo, Japan >> >> http://twitter.com/#!/tatsuya6502 >> >> >> >> >> On 01/08/2011, at 6:33 PM, Friso van Vollenhoven wrote: >> >>> Hey Ryan, >>> I went back to the older version. Problem is that going to HBase 0.90 >> requires a API change on the compressor side, which forces you to a version >> newer than 0.4.6 or so. So I also had to go back to HBase 0.89, which is >> again not compatible with CDH3b3, so I am back on CDH3b2 again. HBase 0.89 >> is stable for us, so this is not at all a problem. But this LZO problem is >> really in the way of our projected upgrade path (my client would like to end >> up with CDH3 everything in the end, because of the support options available >> in case things go wrong and the Cloudera administration courses available >> when new ops people are hired). >>> >>> Cheers, >>> Friso >>> >>> >>> >>> On 7 jan 2011, at 22:28, Ryan Rawson wrote: >>> >>>> Hey, >>>> >>>> Here at SU we continue to use version 0.1.0 of hadoop-gpl-compression. >>>> I know some of the newer versions had bugs which leaked >>>> DirectByteBuffer space, which might be what you are running in to. >>>> >>>> Give the older version a shot, there really hasnt been much in the way >>>> of how LZO works in a while, most of the 'extra' stuff added was to >>>> support features hbase does not use. >>>> >>>> Good luck! >>>> >>>> -ryan >>>> >>>> ps: http://code.google.com/p/hadoop-gpl-compression/downloads/list >>>> >>>> >>>> On Wed, Jan 5, 2011 at 10:26 PM, Friso van Vollenhoven >>>> <[email protected]> wrote: >>>>> Thanks Sandy. >>>>> >>>>> Does setting -XX:MaxDirectMemorySize help in triggering GC when you're >> reaching that limit? Or does it just OOME before the actual RAM is exhausted >> (then you prevent swapping, which is nicer, though)? >>>>> >>>>> I guess LZO is not a solution that fits all, but we do a lot of random >> reads and latency can be an issue for us, so I suppose we have to stick with >> it. >>>>> >>>>> >>>>> Friso >>>>> >>>>> >>>>> >>>>> On 5 jan 2011, at 20:36, Sandy Pratt wrote: >>>>> >>>>>> I was in a similar situation recently, with similar symptoms, and I >> experienced a crash very similar to yours. I don't have the specifics handy >> at the moment, but I did post to this list about it a few weeks ago. My >> workload is fairly write-heavy. I write about 10-20 million smallish >> protobuf/xml blobs per day to an HBase cluster of 12 very underpowered >> machines. >>>>>> >>>>>> The suggestions I received were two: 1) update to the latest >> hadoop-lzo and 2) specify a max direct memory size to the JVM (e.g. >> -XX:MaxDirectMemorySize=256m). >>>>>> >>>>>> I took a third route - change my tables back to gz compression for the >> time being while I figure out what to do. Since then, my memory usage has >> been rock steady, but more importantly my tables are roughly half the size >> on disk that they were with LZO, and there has been no noticeable drop in >> performance (but remember this is a write heavy workload, I'm not trying to >> serve an online workload with low latency or anything like that). At this >> point, I might not return to LZO. >>>>>> >>>>>> In general, I'm not convinced that "use LZO" is universally good >> advice for all HBase users. For one thing, I think it assumes that all >> installations are focused towards low latency, which is not always the case >> (sometimes merely good latency is enough and great latency is not needed). >> Secondly, it assumes some things about where the performance bottleneck >> lives. For example, LZO performs well in micro-benchmarks, but if you find >> yourself in an IO-bound batch processing situation, you might be better >> served by a higher compression ratio, even if it's more computationally >> expensive. >>>>>> >>>>>> Sandy >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Friso van Vollenhoven [mailto:[email protected]] >>>>>>> Sent: Tuesday, January 04, 2011 08:00 >>>>>>> To: <[email protected]> >>>>>>> Subject: Re: problem with LZO compressor on write only loads >>>>>>> >>>>>>> I ran the job again, but with less other processes running on the >> same >>>>>>> machine, so with more physical memory available to HBase. This was to >> see >>>>>>> whether there was a point where it would stop allocating more >> buffers. >>>>>>> When I do this, after many hours, one of the RSes crashed with a >> OOME. See >>>>>>> here: >>>>>>> >>>>>>> 2011-01-04 11:32:01,332 FATAL >>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region >>>>>>> server serverName=w5r1.inrdb.ripe.net,60020,1294091507228, >>>>>>> load=(requests=6246, regions=258, usedHeap=1790, maxHeap=16000): >>>>>>> Uncaught exception in service thread regionserver60020.compactor >>>>>>> java.lang.OutOfMemoryError: Direct buffer memory >>>>>>> at java.nio.Bits.reserveMemory(Bits.java:633) >>>>>>> at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:98) >>>>>>> at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288) >>>>>>> at >>>>>>> com.hadoop.compression.lzo.LzoCompressor.init(LzoCompressor.java:248) >>>>>>> at >>>>>>> >> com.hadoop.compression.lzo.LzoCompressor.reinit(LzoCompressor.java:207 >>>>>>> ) >>>>>>> at >>>>>>> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java: >>>>>>> 105) >>>>>>> at >>>>>>> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java: >>>>>>> 112) >>>>>>> at >>>>>>> >> org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(C >>>>>>> ompression.java:200) >>>>>>> at >>>>>>> >> org.apache.hadoop.hbase.io.hfile.HFile$Writer.getCompressingStream(HFile >>>>>>> .java:397) >>>>>>> at >>>>>>> >> org.apache.hadoop.hbase.io.hfile.HFile$Writer.newBlock(HFile.java:383) >>>>>>> at >>>>>>> >> org.apache.hadoop.hbase.io.hfile.HFile$Writer.checkBlockBoundary(HFile.ja >>>>>>> va:354) >>>>>>> at >> org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:536) >>>>>>> at >> org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:501) >>>>>>> at >>>>>>> >> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.j >>>>>>> ava:836) >>>>>>> at >>>>>>> org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:931) >>>>>>> at >>>>>>> org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:732) >>>>>>> at >>>>>>> >> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.jav >>>>>>> a:764) >>>>>>> at >>>>>>> >> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.jav >>>>>>> a:709) >>>>>>> at >>>>>>> org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(CompactSp >>>>>>> litThread.java:81) >>>>>>> 2011-01-04 11:32:01,369 INFO >>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics: >>>>>>> request=0.0, regions=258, stores=516, storefiles=186, >>>>>>> storefileIndexSize=179, memstoreSize=2125, compactionQueueSize=2, >>>>>>> usedHeap=1797, maxHeap=16000, blockCacheSize=55051488, >>>>>>> blockCacheFree=6655834912, blockCacheCount=0, blockCacheHitCount=0, >>>>>>> blockCacheMissCount=2397107, blockCacheEvictedCount=0, >>>>>>> blockCacheHitRatio=0, blockCacheHitCachingRatio=0 >>>>>>> >>>>>>> I am guessing the OS won't allocate any more memory to the process. >> As you >>>>>>> can see, the used heap is nowhere near the max heap. >>>>>>> >>>>>>> Also, this happens from the compaction, it seems. I had not >> considered those >>>>>>> as a suspect yet. I could try running with a larger compaction >> threshold and >>>>>>> blocking store files. Since this is a write only load, it should not >> be a problem. >>>>>>> In our normal operation, compactions and splits are quite common, >> though, >>>>>>> because we do read-modify-write cycles a lot. Anyone else doing >> update >>>>>>> heavy work with LZO? >>>>>>> >>>>>>> >>>>>>> Cheers, >>>>>>> Friso >>>>>>> >>>>>>> >>>>>>> On 4 jan 2011, at 01:54, Todd Lipcon wrote: >>>>>>> >>>>>>>> Fishy. Are your cells particularly large? Or have you tuned the >> HFile >>>>>>>> block size at all? >>>>>>>> >>>>>>>> -Todd >>>>>>>> >>>>>>>> On Mon, Jan 3, 2011 at 2:15 PM, Friso van Vollenhoven < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> I tried it, but it doesn't seem to help. The RS processes grow to >>>>>>>>> 30Gb in minutes after the job started. >>>>>>>>> >>>>>>>>> Any ideas? >>>>>>>>> >>>>>>>>> >>>>>>>>> Friso >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 3 jan 2011, at 19:18, Todd Lipcon wrote: >>>>>>>>> >>>>>>>>>> Hi Friso, >>>>>>>>>> >>>>>>>>>> Which OS are you running? Particularly, which version of glibc? >>>>>>>>>> >>>>>>>>>> Can you try running with the environment variable >>>>>>> MALLOC_ARENA_MAX=1 set? >>>>>>>>>> >>>>>>>>>> Thanks >>>>>>>>>> -Todd >>>>>>>>>> >>>>>>>>>> On Mon, Jan 3, 2011 at 8:15 AM, Friso van Vollenhoven < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> Hi all, >>>>>>>>>>> >>>>>>>>>>> I seem to run into a problem that occurs when using LZO >> compression >>>>>>>>>>> on a heavy write only load. I am using 0.90 RC1 and, thus, the >> LZO >>>>>>>>>>> compressor code that supports the reinit() method (from Kevin >>>>>>>>>>> Weil's github, >>>>>>>>> version >>>>>>>>>>> 0.4.8). There are some more Hadoop LZO incarnations, so I am >>>>>>>>>>> pointing my question to this list. >>>>>>>>>>> >>>>>>>>>>> It looks like the compressor uses direct byte buffers to store >> the >>>>>>>>> original >>>>>>>>>>> and compressed bytes in memory, so the native code can work with >> it >>>>>>>>> without >>>>>>>>>>> the JVM having to copy anything around. The direct buffers are >>>>>>>>>>> possibly reused after a reinit() call, but will often be newly >>>>>>>>>>> created in the >>>>>>>>> init() >>>>>>>>>>> method, because the existing buffer can be the wrong size for >> reusing. >>>>>>>>> The >>>>>>>>>>> latter case will leave the previously used buffers by the >>>>>>>>>>> compressor instance eligible for garbage collection. I think the >>>>>>>>>>> problem is that >>>>>>>>> this >>>>>>>>>>> collection never occurs (in time), because the GC does not >> consider >>>>>>>>>>> it necessary yet. The GC does not know about the native heap and >>>>>>>>>>> based on >>>>>>>>> the >>>>>>>>>>> state of the JVM heap, there is no reason to finalize these >> objects yet. >>>>>>>>>>> However, direct byte buffers are only freed in the finalizer, so >>>>>>>>>>> the >>>>>>>>> native >>>>>>>>>>> heap keeps growing. On write only loads, a full GC will rarely >>>>>>>>>>> happen, because the max heap will not grow far beyond the mem >>>>>>>>>>> stores (no block >>>>>>>>> cache >>>>>>>>>>> is used). So what happens is that the machine starts using swap >>>>>>>>>>> before >>>>>>>>> the >>>>>>>>>>> GC will ever clean up the direct byte buffers. I am guessing that >>>>>>>>> without >>>>>>>>>>> the reinit() support, the buffers were collected earlier because >>>>>>>>>>> the referring objects would also be collected every now and then >> or >>>>>>>>>>> things >>>>>>>>> would >>>>>>>>>>> perhaps just never promote to an older generation. >>>>>>>>>>> >>>>>>>>>>> When I do a pmap on a running RS after it has grown to some 40Gb >>>>>>>>> resident >>>>>>>>>>> size (with a 16Gb heap), it will show a lot of near 64M anon >> blocks >>>>>>>>>>> (presumably native heap). I show this before with the 0.4.6 >> version >>>>>>>>>>> of Hadoop LZO, but that was under normal load. After that I went >>>>>>>>>>> back to a HBase version that does not require the reinit(). Now I >>>>>>>>>>> am on 0.90 with >>>>>>>>> the >>>>>>>>>>> new LZO, but never did a heavy load like this one with that, >> until >>>>>>>>> now... >>>>>>>>>>> >>>>>>>>>>> Can anyone with a better understanding of the LZO code confirm >> that >>>>>>>>>>> the above could be the case? If so, would it be possible to >> change >>>>>>>>>>> the LZO compressor (and decompressor) to use maybe just one fixed >>>>>>>>>>> size buffer >>>>>>>>> (they >>>>>>>>>>> all appear near 64M anyway) or possibly reuse an existing buffer >>>>>>>>>>> also >>>>>>>>> when >>>>>>>>>>> it is not the exact required size but just large enough to make >> do? >>>>>>>>> Having >>>>>>>>>>> short lived direct byte buffers is apparently a discouraged >>>>>>>>>>> practice. If anyone can provide some pointers on what to look out >>>>>>>>>>> for, I could invest some time in creating a patch. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Friso >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Todd Lipcon >>>>>>>>>> Software Engineer, Cloudera >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Todd Lipcon >>>>>>>> Software Engineer, Cloudera >>>>>> >>>>> >>>>> >>> >> >> >> >> > > > -- > Todd Lipcon > Software Engineer, Cloudera
