Re: problem with LZO compressor on write only loads

Todd Lipcon Sat, 08 Jan 2011 18:49:11 -0800

Hey everyone,

Just wanted to let you know that I will be looking into this this coming
week - we've marked it as an important thing to investigate prior t our next
beta release.


Thanks
-Todd

On Sat, Jan 8, 2011 at 4:59 AM, Tatsuya Kawano <[email protected]>wrote:

>
> Hi Friso,
>
> So you found HBase 0.89 on CDH3b2 doesn't have the problem. I wonder what
> would happen if you replace hadoop-core-*.jar in CDH3b3 with the one
> contained in HBase 0.90RC distribution
> (hadoop-core-0.20-append-r1056497.jar) and then rebuild hadoop-lzo against
> it.
>
> Here is the comment on the LzoCompressor#reinit() method:
>
> -----------------------------------
> // ... this method isn't in vanilla 0.20.2, but is in CDH3b3 and YDH
>  public void reinit(Configuration conf) {
> -----------------------------------
>
>
> https://github.com/kevinweil/hadoop-lzo/blob/6cbf4e232d7972c94107600567333a372ea08c0a/src/java/com/hadoop/compression/lzo/LzoCompressor.java#L196
>
>
> I don't know if hadoop-core-0.20-append-r1056497.jar is a vanilla 0.20.2 or
> more like CDH3b3. Maybe I'm wrong, but if it doesn't call reinit(), you'll
> have a good chance to get a stable HBase 0.90.
>
> Good luck!
>
> Tatsuya
>
> --
> Tatsuya Kawano (Mr.)
> Tokyo, Japan
>
> http://twitter.com/#!/tatsuya6502
>
>
>
>
> On 01/08/2011, at 6:33 PM, Friso van Vollenhoven wrote:
>
> > Hey Ryan,
> > I went back to the older version. Problem is that going to HBase 0.90
> requires a API change on the compressor side, which forces you to a version
> newer than 0.4.6 or so. So I also had to go back to HBase 0.89, which is
> again not compatible with CDH3b3, so I am back on CDH3b2 again. HBase 0.89
> is stable for us, so this is not at all a problem. But this LZO problem is
> really in the way of our projected upgrade path (my client would like to end
> up with CDH3 everything in the end, because of the support options available
> in case things go wrong and the Cloudera administration courses available
> when new ops people are hired).
> >
> > Cheers,
> > Friso
> >
> >
> >
> > On 7 jan 2011, at 22:28, Ryan Rawson wrote:
> >
> >> Hey,
> >>
> >> Here at SU we continue to use version 0.1.0 of hadoop-gpl-compression.
> >> I know some of the newer versions had bugs which leaked
> >> DirectByteBuffer space, which might be what you are running in to.
> >>
> >> Give the older version a shot, there really hasnt been much in the way
> >> of how LZO works in a while, most of the 'extra' stuff added was to
> >> support features hbase does not use.
> >>
> >> Good luck!
> >>
> >> -ryan
> >>
> >> ps: http://code.google.com/p/hadoop-gpl-compression/downloads/list
> >>
> >>
> >> On Wed, Jan 5, 2011 at 10:26 PM, Friso van Vollenhoven
> >> <[email protected]> wrote:
> >>> Thanks Sandy.
> >>>
> >>> Does setting -XX:MaxDirectMemorySize help in triggering GC when you're
> reaching that limit? Or does it just OOME before the actual RAM is exhausted
> (then you prevent swapping, which is nicer, though)?
> >>>
> >>> I guess LZO is not a solution that fits all, but we do a lot of random
> reads and latency can be an issue for us, so I suppose we have to stick with
> it.
> >>>
> >>>
> >>> Friso
> >>>
> >>>
> >>>
> >>> On 5 jan 2011, at 20:36, Sandy Pratt wrote:
> >>>
> >>>> I was in a similar situation recently, with similar symptoms, and I
> experienced a crash very similar to yours.  I don't have the specifics handy
> at the moment, but I did post to this list about it a few weeks ago.  My
> workload is fairly write-heavy.  I write about 10-20 million smallish
> protobuf/xml blobs per day to an HBase cluster of 12 very underpowered
> machines.
> >>>>
> >>>> The suggestions I received were two: 1) update to the latest
> hadoop-lzo and 2) specify a max direct memory size to the JVM (e.g.
> -XX:MaxDirectMemorySize=256m).
> >>>>
> >>>> I took a third route - change my tables back to gz compression for the
> time being while I figure out what to do.  Since then, my memory usage has
> been rock steady, but more importantly my tables are roughly half the size
> on disk that they were with LZO, and there has been no noticeable drop in
> performance (but remember this is a write heavy workload, I'm not trying to
> serve an online workload with low latency or anything like that).  At this
> point, I might not return to LZO.
> >>>>
> >>>> In general, I'm not convinced that "use LZO" is universally good
> advice for all HBase users.  For one thing, I think it assumes that all
> installations are focused towards low latency, which is not always the case
> (sometimes merely good latency is enough and great latency is not needed).
>  Secondly, it assumes some things about where the performance bottleneck
> lives.   For example, LZO performs well in micro-benchmarks, but if you find
> yourself in an IO-bound batch processing situation, you might be better
> served by a higher compression ratio, even if it's more computationally
> expensive.
> >>>>
> >>>> Sandy
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: Friso van Vollenhoven [mailto:[email protected]]
> >>>>> Sent: Tuesday, January 04, 2011 08:00
> >>>>> To: <[email protected]>
> >>>>> Subject: Re: problem with LZO compressor on write only loads
> >>>>>
> >>>>> I ran the job again, but with less other processes running on the
> same
> >>>>> machine, so with more physical memory available to HBase. This was to
> see
> >>>>> whether there was a point where it would stop allocating more
> buffers.
> >>>>> When I do this, after many hours, one of the RSes crashed with a
> OOME. See
> >>>>> here:
> >>>>>
> >>>>> 2011-01-04 11:32:01,332 FATAL
> >>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region
> >>>>> server serverName=w5r1.inrdb.ripe.net,60020,1294091507228,
> >>>>> load=(requests=6246, regions=258, usedHeap=1790, maxHeap=16000):
> >>>>> Uncaught exception in service thread regionserver60020.compactor
> >>>>> java.lang.OutOfMemoryError: Direct buffer memory
> >>>>>      at java.nio.Bits.reserveMemory(Bits.java:633)
> >>>>>      at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:98)
> >>>>>      at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288)
> >>>>>      at
> >>>>> com.hadoop.compression.lzo.LzoCompressor.init(LzoCompressor.java:248)
> >>>>>      at
> >>>>>
> com.hadoop.compression.lzo.LzoCompressor.reinit(LzoCompressor.java:207
> >>>>> )
> >>>>>      at
> >>>>> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:
> >>>>> 105)
> >>>>>      at
> >>>>> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:
> >>>>> 112)
> >>>>>      at
> >>>>>
> org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(C
> >>>>> ompression.java:200)
> >>>>>      at
> >>>>>
> org.apache.hadoop.hbase.io.hfile.HFile$Writer.getCompressingStream(HFile
> >>>>> .java:397)
> >>>>>      at
> >>>>>
> org.apache.hadoop.hbase.io.hfile.HFile$Writer.newBlock(HFile.java:383)
> >>>>>      at
> >>>>>
> org.apache.hadoop.hbase.io.hfile.HFile$Writer.checkBlockBoundary(HFile.ja
> >>>>> va:354)
> >>>>>      at
> org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:536)
> >>>>>      at
> org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:501)
> >>>>>      at
> >>>>>
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.j
> >>>>> ava:836)
> >>>>>      at
> >>>>> org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:931)
> >>>>>      at
> >>>>> org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:732)
> >>>>>      at
> >>>>>
> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.jav
> >>>>> a:764)
> >>>>>      at
> >>>>>
> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.jav
> >>>>> a:709)
> >>>>>      at
> >>>>> org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(CompactSp
> >>>>> litThread.java:81)
> >>>>> 2011-01-04 11:32:01,369 INFO
> >>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
> >>>>> request=0.0, regions=258, stores=516, storefiles=186,
> >>>>> storefileIndexSize=179, memstoreSize=2125, compactionQueueSize=2,
> >>>>> usedHeap=1797, maxHeap=16000, blockCacheSize=55051488,
> >>>>> blockCacheFree=6655834912, blockCacheCount=0, blockCacheHitCount=0,
> >>>>> blockCacheMissCount=2397107, blockCacheEvictedCount=0,
> >>>>> blockCacheHitRatio=0, blockCacheHitCachingRatio=0
> >>>>>
> >>>>> I am guessing the OS won't allocate any more memory to the process.
> As you
> >>>>> can see, the used heap is nowhere near the max heap.
> >>>>>
> >>>>> Also, this happens from the compaction, it seems. I had not
> considered those
> >>>>> as a suspect yet. I could try running with a larger compaction
> threshold and
> >>>>> blocking store files. Since this is a write only load, it should not
> be a problem.
> >>>>> In our normal operation, compactions and splits are quite common,
> though,
> >>>>> because we do read-modify-write cycles a lot. Anyone else doing
> update
> >>>>> heavy work with LZO?
> >>>>>
> >>>>>
> >>>>> Cheers,
> >>>>> Friso
> >>>>>
> >>>>>
> >>>>> On 4 jan 2011, at 01:54, Todd Lipcon wrote:
> >>>>>
> >>>>>> Fishy. Are your cells particularly large? Or have you tuned the
> HFile
> >>>>>> block size at all?
> >>>>>>
> >>>>>> -Todd
> >>>>>>
> >>>>>> On Mon, Jan 3, 2011 at 2:15 PM, Friso van Vollenhoven <
> >>>>>> [email protected]> wrote:
> >>>>>>
> >>>>>>> I tried it, but it doesn't seem to help. The RS processes grow to
> >>>>>>> 30Gb in minutes after the job started.
> >>>>>>>
> >>>>>>> Any ideas?
> >>>>>>>
> >>>>>>>
> >>>>>>> Friso
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On 3 jan 2011, at 19:18, Todd Lipcon wrote:
> >>>>>>>
> >>>>>>>> Hi Friso,
> >>>>>>>>
> >>>>>>>> Which OS are you running? Particularly, which version of glibc?
> >>>>>>>>
> >>>>>>>> Can you try running with the environment variable
> >>>>> MALLOC_ARENA_MAX=1 set?
> >>>>>>>>
> >>>>>>>> Thanks
> >>>>>>>> -Todd
> >>>>>>>>
> >>>>>>>> On Mon, Jan 3, 2011 at 8:15 AM, Friso van Vollenhoven <
> >>>>>>>> [email protected]> wrote:
> >>>>>>>>
> >>>>>>>>> Hi all,
> >>>>>>>>>
> >>>>>>>>> I seem to run into a problem that occurs when using LZO
> compression
> >>>>>>>>> on a heavy write only load. I am using 0.90 RC1 and, thus, the
> LZO
> >>>>>>>>> compressor code that supports the reinit() method (from Kevin
> >>>>>>>>> Weil's github,
> >>>>>>> version
> >>>>>>>>> 0.4.8). There are some more Hadoop LZO incarnations, so I am
> >>>>>>>>> pointing my question to this list.
> >>>>>>>>>
> >>>>>>>>> It looks like the compressor uses direct byte buffers to store
> the
> >>>>>>> original
> >>>>>>>>> and compressed bytes in memory, so the native code can work with
> it
> >>>>>>> without
> >>>>>>>>> the JVM having to copy anything around. The direct buffers are
> >>>>>>>>> possibly reused after a reinit() call, but will often be newly
> >>>>>>>>> created in the
> >>>>>>> init()
> >>>>>>>>> method, because the existing buffer can be the wrong size for
> reusing.
> >>>>>>> The
> >>>>>>>>> latter case will leave the previously used buffers by the
> >>>>>>>>> compressor instance eligible for garbage collection. I think the
> >>>>>>>>> problem is that
> >>>>>>> this
> >>>>>>>>> collection never occurs (in time), because the GC does not
> consider
> >>>>>>>>> it necessary yet. The GC does not know about the native heap and
> >>>>>>>>> based on
> >>>>>>> the
> >>>>>>>>> state of the JVM heap, there is no reason to finalize these
> objects yet.
> >>>>>>>>> However, direct byte buffers are only freed in the finalizer, so
> >>>>>>>>> the
> >>>>>>> native
> >>>>>>>>> heap keeps growing. On write only loads, a full GC will rarely
> >>>>>>>>> happen, because the max heap will not grow far beyond the mem
> >>>>>>>>> stores (no block
> >>>>>>> cache
> >>>>>>>>> is used). So what happens is that the machine starts using swap
> >>>>>>>>> before
> >>>>>>> the
> >>>>>>>>> GC will ever clean up the direct byte buffers. I am guessing that
> >>>>>>> without
> >>>>>>>>> the reinit() support, the buffers were collected earlier because
> >>>>>>>>> the referring objects would also be collected every now and then
> or
> >>>>>>>>> things
> >>>>>>> would
> >>>>>>>>> perhaps just never promote to an older generation.
> >>>>>>>>>
> >>>>>>>>> When I do a pmap on a running RS after it has grown to some 40Gb
> >>>>>>> resident
> >>>>>>>>> size (with a 16Gb heap), it will show a lot of near 64M anon
> blocks
> >>>>>>>>> (presumably native heap). I show this before with the 0.4.6
> version
> >>>>>>>>> of Hadoop LZO, but that was under normal load. After that I went
> >>>>>>>>> back to a HBase version that does not require the reinit(). Now I
> >>>>>>>>> am on 0.90 with
> >>>>>>> the
> >>>>>>>>> new LZO, but never did a heavy load like this one with that,
> until
> >>>>>>> now...
> >>>>>>>>>
> >>>>>>>>> Can anyone with a better understanding of the LZO code confirm
> that
> >>>>>>>>> the above could be the case? If so, would it be possible to
> change
> >>>>>>>>> the LZO compressor (and decompressor) to use maybe just one fixed
> >>>>>>>>> size buffer
> >>>>>>> (they
> >>>>>>>>> all appear near 64M anyway) or possibly reuse an existing buffer
> >>>>>>>>> also
> >>>>>>> when
> >>>>>>>>> it is not the exact required size but just large enough to make
> do?
> >>>>>>> Having
> >>>>>>>>> short lived direct byte buffers is apparently a discouraged
> >>>>>>>>> practice. If anyone can provide some pointers on what to look out
> >>>>>>>>> for, I could invest some time in creating a patch.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> Friso
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Todd Lipcon
> >>>>>>>> Software Engineer, Cloudera
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Todd Lipcon
> >>>>>> Software Engineer, Cloudera
> >>>>
> >>>
> >>>
> >
>
>
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Re: problem with LZO compressor on write only loads

Reply via email to