Re: Simple OOM crash?

Todd Lipcon Fri, 17 Dec 2010 19:24:07 -0800

On Fri, Dec 17, 2010 at 2:32 PM, Sandy Pratt <[email protected]> wrote:
> Todd,
>
> While we're on the subject, and since you seem to know LZO well, can you 
> answer a few questions that have been playing around in my mind lately?
>
> 1) Does GZ also use the Direct Memory Buffer like LZO does?


I don't know much about the gzip codec, but I believe so long as
you're using the native one (ie have the hadoop native libraries
installed) it is very similar, yes.

>
> 2) What size to you run with for that buffer?  I kicked it up to 512m the 
> other day and I haven't seen problems but I wonder if that's overkill.

Which buffer are you referring to? I don't do any particular tuning
for the LZO codec. I do usually set io.file.buffer.size to 128KB in
Hadoop, but that's at a different layer.

>
> 3) How do you think LZO memory use compares to GZ?  The reason I ask is 
> because ISTR reading that GZ is very light on memory.  If it's significantly 
> lighter than LZO, it might be worth my while to use GZ instead, even though 
> it's slower than LZO, and use the freed memory to allocate another map slot.
>

All the LZO buffers are pooled and pretty transient so long as there
isn't a leak (like the bug you hit). Without a leak it should be
responsible for <1M of memory usage, in my experience.

Thanks
-Todd

>
> -----Original Message-----
> From: Sandy Pratt [mailto:[email protected]]
> Sent: Friday, December 17, 2010 14:04
> To: [email protected]
> Subject: RE: Simple OOM crash?
>
> That worked.  Thanks!
>
> -----Original Message-----
> From: Todd Lipcon [mailto:[email protected]]
> Sent: Friday, December 17, 2010 13:54
> To: [email protected]
> Subject: Re: Simple OOM crash?
>
> Hi Sandy,
>
> I've seen that error on github as well. Try using the git:// URL instead of 
> the http:// URL. The http transport in git is a bit buggy.
>
> Worst case there's also an option to download a tarball there.
>
> -Todd
>
> On Fri, Dec 17, 2010 at 10:59 AM, Sandy Pratt <[email protected]> wrote:
>> Thanks all for your help.
>>
>> I set about to update the hadoop-lzo jar using Todd Lipcon's git repo 
>> (https://github.com/toddlipcon/hadoop-lzo), and I encountered an error.  I'm 
>> not a git user, so I could be doing something wrong, but I'm not sure what.  
>> Has something changed with this repo in the last month or two?
>>
>> The error is pasted below:
>>
>>  [had...@ets-lax-prod-hadoop-01 hadoop-lzo]$ git pull walk
>> 7cbf6e85ad992faac880ef54a78ce926b6c02bda
>> walk fdbddcafd8276497d0181d40d72756336d204374
>> Getting alternates list for
>> http://github.com/toddlipcon/hadoop-lzo.git
>> Also look at http://github.com/network/312869.git/
>> error: The requested URL returned error: 502 (curl_result = 22,
>> http_code = 502, sha1 = 552b3f9cc1c7fd08bedfe029cf76a08e42302ae4)
>> Getting pack list for http://github.com/toddlipcon/hadoop-lzo.git
>> Getting pack list for http://github.com/network/312869.git/
>> error: The requested URL returned error: 502
>> error: Unable to find 552b3f9cc1c7fd08bedfe029cf76a08e42302ae4 under
>> http://github.com/toddlipcon/hadoop-lzo.git
>> Cannot obtain needed commit 552b3f9cc1c7fd08bedfe029cf76a08e42302ae4
>> while processing commit fdbddcafd8276497d0181d40d72756336d204374.
>> fatal: Fetch failed.
>>
>>
>> Thanks,
>>
>> Sandy
>>
>>
>> -----Original Message-----
>> From: Andrew Purtell [mailto:[email protected]]
>> Sent: Thursday, December 16, 2010 17:22
>> To: [email protected]
>> Cc: Cosmin Lehene
>> Subject: RE: Simple OOM crash?
>>
>> Use hadoop-lzo-0.4.7 or higher from
>> https://github.com/toddlipcon/hadoop-lzo
>>
>>
>> Best regards,
>>
>>    - Andy
>>
>>
>> --- On Thu, 12/16/10, Sandy Pratt <[email protected]> wrote:
>>
>>> From: Sandy Pratt <[email protected]>
>>> Subject: RE: Simple OOM crash?
>>> To: "[email protected]" <[email protected]>
>>> Cc: "Cosmin Lehene" <[email protected]>
>>> Date: Thursday, December 16, 2010, 4:00 PM
>>>
>>> The LZO jar installed is:
>>>
>>> hadoop-lzo-0.4.6.jar
>>>
>>> The native LZO libs are from EPEL (I think) installed on Centos 5.5
>>> 64
>>> bit:
>>>
>>> [had...@ets-lax-prod-hadoop-02 Linux-amd64-64]$ yum info lzo-devel
>>> Name       : lzo-devel Arch       : x86_64 Version    : 2.02 Release
>>> : 2.el5.1 Size       : 144 k Repo       : installed Summary    :
>>> Development files for the lzo library URL        :
>>> http://www.oberhumer.com/opensource/lzo/
>>> License    : GPL
>>> Description: LZO is a portable lossless data compression library
>>> written in ANSI C.
>>>            : It offers
>>> pretty fast compression and very fast decompression.
>>>            : This
>>> package contains development files needed for lzo.
>>>
>>> Is the direct buffer used only with LZO, or is it always involved
>>> with HBase read/writes?
>>>
>>> Thanks for the help,
>>> Sandy
>>>
>>>
>>> -----Original Message-----
>>> From: Ryan Rawson [mailto:[email protected]]
>>>
>>> Sent: Thursday, December 16, 2010 15:50
>>> To: [email protected]
>>> Cc: Cosmin Lehene
>>> Subject: Re: Simple OOM crash?
>>>
>>> What LZO version are you using?  You aren't running out of regular
>>> heap, you are running out of "Direct buffer memory" which is capped
>>> to prevent mishaps.  There is a flag to increase that size:
>>>
>>> -XX:MaxDirectMemorySize=100m
>>>
>>> etc
>>>
>>> enjoy,
>>> -ryan
>>>
>>> On Thu, Dec 16, 2010 at 3:07 PM, Sandy Pratt <[email protected]>
>>> wrote:
>>> > Hello HBasers,
>>> >
>>> > I had a regionserver crash recently, and in perusing
>>> the logs it looks like it simply had a bit too little memory.  I'm
>>> running with 2200 MB heap on reach regionserver.  I plan to shave a
>>> bit off the child VM allowance in favor of the regionserver to
>>> correct this, probably bringing it up to 2500 MB.  My question is if
>>> there is any more specific memory allocation I should make rather
>>> than simply giving more to the RS.  I wonder about this because of the 
>>> following:
>>> >
>>> > load=(requests=0, regions=709, usedHeap=1349,
>>> maxHeap=2198)
>>> >
>>> > which suggests to me that there was heap available,
>>> but the RS couldn't use it for some reason.
>>> >
>>> > Conjecture: I do run with LZO compression, so I wonder
>>> if I could be hitting that memory leak referenced earlier on the list.
>>> I know there's a new version of the LZO library available that I
>>> should upgrade to, but is it also possible to simply alter the table
>>> to gzip compression and do a major compaction, then uninstall LZO
>>> once that completes?
>>> >
>>> > Log follows:
>>> >
>>> > 2010-12-15 20:01:05,239 INFO
>>> > org.apache.hadoop.hbase.regionserver.HRegion: Starting
>>> compaction on
>>> > region
>>> ets.events,36345112f5654a29b308014f89c108e6,12798158203
>>> > 11.1063152548
>>> > 2010-12-15 20:01:05,239 DEBUG
>>> > org.apache.hadoop.hbase.regionserver.Store: Major
>>> compaction triggered
>>> > on store f1; time since last major compaction
>>> 119928149ms
>>> > 2010-12-15 20:01:05,240 INFO
>>> > org.apache.hadoop.hbase.regionserver.Store: Started
>>> compaction of 2
>>> > file(s) in f1 of
>>> ets.events,36345112f5654a29b308014f89c108e6,12
>>> > 79815820311.1063152548  into
>>> >
>>> hdfs://ets-lax-prod-hadoop-01.corp.adobe.com:54310/hbase/ets.events/1
>>> 0
>>> > 63152548/.tmp, sequenceid=25718885315
>>> > 2010-12-15 20:01:19,403 WARN
>>> > org.apache.hadoop.hbase.regionserver.Store: Not in
>>> >
>>> setorg.apache.hadoop.hbase.regionserver.storescan...@7466c84
>>> > 2010-12-15 20:01:19,572 FATAL
>>> > org.apache.hadoop.hbase.regionserver.HRegionServer:
>>> Aborting region
>>> > server
>>> serverName=ets-lax-prod-hadoop-02.corp.adobe.com,60020,
>>> > 1289682554219, load=(requests=0, regions=709,
>>> usedHeap=1349,
>>> > maxHeap=2198): Uncaught exception in service thread
>>> > regionserver60020.compactor
>>> > java.lang.OutOfMemoryError: Direct buffer memory
>>> >        at
>>> java.nio.Bits.reserveMemory(Bits.java:656)
>>> >        at
>>> java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:113)
>>> >        at
>>> java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:305)
>>> >        at
>>> >
>>> com.hadoop.compression.lzo.LzoCompressor.init(LzoCompressor.java:223)
>>> >        at
>>> >
>>> com.hadoop.compression.lzo.LzoCompressor.reinit(LzoCompressor.java:20
>>> 7
>>> > )
>>> >        at
>>> >
>>> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:
>>> 1
>>> > 05)
>>> >        at
>>> >
>>> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:
>>> 1
>>> > 12)
>>> >        at
>>> >
>>> org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(
>>> C
>>> > ompression.java:198)
>>> >        at
>>> >
>>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.getCompressingStream(HF
>>> i
>>> > le.java:391)
>>> >        at
>>> >
>>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.newBlock(HFile.java:377
>>> )
>>> >        at
>>> >
>>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.checkBlockBoundary(HFil
>>> e
>>> > .java:348)
>>> >        at
>>> >
>>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:530)
>>> >        at
>>> >
>>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:495)
>>> >        at
>>> >
>>> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFil
>>> e
>>> > .java:817)
>>> >        at
>>> >
>>> org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:811)
>>> >        at
>>> >
>>> org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:670)
>>> >        at
>>> >
>>> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.ja
>>> v
>>> > a:722)
>>> >        at
>>> >
>>> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.ja
>>> v
>>> > a:671)
>>> >        at
>>> >
>>> org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(CompactSp
>>> l
>>> > itThread.java:84)
>>> > 2010-12-15 20:01:19,586 INFO
>>> > org.apache.hadoop.hbase.regionserver.HRegionServer:
>>> Dump of metrics:
>>> > request=0.0, regions=709, stores=709, storefiles=731,
>>>
>>> > storefileIndexSize=418, memstoreSize=33,
>>> compactionQueueSize=15,
>>> > usedHeap=856, maxHeap=2198, blockCacheSize=366779472,
>>>
>>> > blockCacheFree=87883088, blockCacheCount=5494,
>>> blockCacheHitRatio=0
>>> > 2010-12-15 20:01:20,571 INFO
>>> org.apache.hadoop.ipc.HBaseServer:
>>> > Stopping server on 60020
>>> >
>>> > Thanks,
>>> >
>>> > Sandy
>>> >
>>> >
>>>
>>
>>
>>
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Simple OOM crash?

Reply via email to