RE: Simple OOM crash?

Sandy Pratt Fri, 17 Dec 2010 14:33:15 -0800

Todd,

While we're on the subject, and since you seem to know LZO well, can you answer 
a few questions that have been playing around in my mind lately?


1) Does GZ also use the Direct Memory Buffer like LZO does?  

2) What size to you run with for that buffer?  I kicked it up to 512m the other 
day and I haven't seen problems but I wonder if that's overkill.

3) How do you think LZO memory use compares to GZ?  The reason I ask is because 
ISTR reading that GZ is very light on memory.  If it's significantly lighter 
than LZO, it might be worth my while to use GZ instead, even though it's slower 
than LZO, and use the freed memory to allocate another map slot.

Thanks for your help,

Sandy

-----Original Message-----
From: Sandy Pratt [mailto:[email protected]] 
Sent: Friday, December 17, 2010 14:04
To: [email protected]
Subject: RE: Simple OOM crash?

That worked.  Thanks!

-----Original Message-----
From: Todd Lipcon [mailto:[email protected]]
Sent: Friday, December 17, 2010 13:54
To: [email protected]
Subject: Re: Simple OOM crash?

Hi Sandy,

I've seen that error on github as well. Try using the git:// URL instead of the 
http:// URL. The http transport in git is a bit buggy.

Worst case there's also an option to download a tarball there.

-Todd

On Fri, Dec 17, 2010 at 10:59 AM, Sandy Pratt <[email protected]> wrote:
> Thanks all for your help.
>
> I set about to update the hadoop-lzo jar using Todd Lipcon's git repo 
> (https://github.com/toddlipcon/hadoop-lzo), and I encountered an error.  I'm 
> not a git user, so I could be doing something wrong, but I'm not sure what.  
> Has something changed with this repo in the last month or two?
>
> The error is pasted below:
>
>  [had...@ets-lax-prod-hadoop-01 hadoop-lzo]$ git pull walk 
> 7cbf6e85ad992faac880ef54a78ce926b6c02bda
> walk fdbddcafd8276497d0181d40d72756336d204374
> Getting alternates list for
> http://github.com/toddlipcon/hadoop-lzo.git
> Also look at http://github.com/network/312869.git/
> error: The requested URL returned error: 502 (curl_result = 22, 
> http_code = 502, sha1 = 552b3f9cc1c7fd08bedfe029cf76a08e42302ae4)
> Getting pack list for http://github.com/toddlipcon/hadoop-lzo.git
> Getting pack list for http://github.com/network/312869.git/
> error: The requested URL returned error: 502
> error: Unable to find 552b3f9cc1c7fd08bedfe029cf76a08e42302ae4 under 
> http://github.com/toddlipcon/hadoop-lzo.git
> Cannot obtain needed commit 552b3f9cc1c7fd08bedfe029cf76a08e42302ae4
> while processing commit fdbddcafd8276497d0181d40d72756336d204374.
> fatal: Fetch failed.
>
>
> Thanks,
>
> Sandy
>
>
> -----Original Message-----
> From: Andrew Purtell [mailto:[email protected]]
> Sent: Thursday, December 16, 2010 17:22
> To: [email protected]
> Cc: Cosmin Lehene
> Subject: RE: Simple OOM crash?
>
> Use hadoop-lzo-0.4.7 or higher from
> https://github.com/toddlipcon/hadoop-lzo
>
>
> Best regards,
>
>    - Andy
>
>
> --- On Thu, 12/16/10, Sandy Pratt <[email protected]> wrote:
>
>> From: Sandy Pratt <[email protected]>
>> Subject: RE: Simple OOM crash?
>> To: "[email protected]" <[email protected]>
>> Cc: "Cosmin Lehene" <[email protected]>
>> Date: Thursday, December 16, 2010, 4:00 PM
>>
>> The LZO jar installed is:
>>
>> hadoop-lzo-0.4.6.jar
>>
>> The native LZO libs are from EPEL (I think) installed on Centos 5.5
>> 64
>> bit:
>>
>> [had...@ets-lax-prod-hadoop-02 Linux-amd64-64]$ yum info lzo-devel 
>> Name       : lzo-devel Arch       : x86_64 Version    : 2.02 Release
>> : 2.el5.1 Size       : 144 k Repo       : installed Summary    :
>> Development files for the lzo library URL        :
>> http://www.oberhumer.com/opensource/lzo/
>> License    : GPL
>> Description: LZO is a portable lossless data compression library 
>> written in ANSI C.
>>            : It offers
>> pretty fast compression and very fast decompression.
>>            : This
>> package contains development files needed for lzo.
>>
>> Is the direct buffer used only with LZO, or is it always involved 
>> with HBase read/writes?
>>
>> Thanks for the help,
>> Sandy
>>
>>
>> -----Original Message-----
>> From: Ryan Rawson [mailto:[email protected]]
>>
>> Sent: Thursday, December 16, 2010 15:50
>> To: [email protected]
>> Cc: Cosmin Lehene
>> Subject: Re: Simple OOM crash?
>>
>> What LZO version are you using?  You aren't running out of regular 
>> heap, you are running out of "Direct buffer memory" which is capped 
>> to prevent mishaps.  There is a flag to increase that size:
>>
>> -XX:MaxDirectMemorySize=100m
>>
>> etc
>>
>> enjoy,
>> -ryan
>>
>> On Thu, Dec 16, 2010 at 3:07 PM, Sandy Pratt <[email protected]>
>> wrote:
>> > Hello HBasers,
>> >
>> > I had a regionserver crash recently, and in perusing
>> the logs it looks like it simply had a bit too little memory.  I'm 
>> running with 2200 MB heap on reach regionserver.  I plan to shave a 
>> bit off the child VM allowance in favor of the regionserver to 
>> correct this, probably bringing it up to 2500 MB.  My question is if 
>> there is any more specific memory allocation I should make rather 
>> than simply giving more to the RS.  I wonder about this because of the 
>> following:
>> >
>> > load=(requests=0, regions=709, usedHeap=1349,
>> maxHeap=2198)
>> >
>> > which suggests to me that there was heap available,
>> but the RS couldn't use it for some reason.
>> >
>> > Conjecture: I do run with LZO compression, so I wonder
>> if I could be hitting that memory leak referenced earlier on the list.
>> I know there's a new version of the LZO library available that I 
>> should upgrade to, but is it also possible to simply alter the table 
>> to gzip compression and do a major compaction, then uninstall LZO 
>> once that completes?
>> >
>> > Log follows:
>> >
>> > 2010-12-15 20:01:05,239 INFO
>> > org.apache.hadoop.hbase.regionserver.HRegion: Starting
>> compaction on
>> > region
>> ets.events,36345112f5654a29b308014f89c108e6,12798158203
>> > 11.1063152548
>> > 2010-12-15 20:01:05,239 DEBUG
>> > org.apache.hadoop.hbase.regionserver.Store: Major
>> compaction triggered
>> > on store f1; time since last major compaction
>> 119928149ms
>> > 2010-12-15 20:01:05,240 INFO
>> > org.apache.hadoop.hbase.regionserver.Store: Started
>> compaction of 2
>> > file(s) in f1 of
>> ets.events,36345112f5654a29b308014f89c108e6,12
>> > 79815820311.1063152548  into
>> >
>> hdfs://ets-lax-prod-hadoop-01.corp.adobe.com:54310/hbase/ets.events/1
>> 0
>> > 63152548/.tmp, sequenceid=25718885315
>> > 2010-12-15 20:01:19,403 WARN
>> > org.apache.hadoop.hbase.regionserver.Store: Not in
>> >
>> setorg.apache.hadoop.hbase.regionserver.storescan...@7466c84
>> > 2010-12-15 20:01:19,572 FATAL
>> > org.apache.hadoop.hbase.regionserver.HRegionServer:
>> Aborting region
>> > server
>> serverName=ets-lax-prod-hadoop-02.corp.adobe.com,60020,
>> > 1289682554219, load=(requests=0, regions=709,
>> usedHeap=1349,
>> > maxHeap=2198): Uncaught exception in service thread 
>> > regionserver60020.compactor
>> > java.lang.OutOfMemoryError: Direct buffer memory
>> >        at
>> java.nio.Bits.reserveMemory(Bits.java:656)
>> >        at
>> java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:113)
>> >        at
>> java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:305)
>> >        at
>> >
>> com.hadoop.compression.lzo.LzoCompressor.init(LzoCompressor.java:223)
>> >        at
>> >
>> com.hadoop.compression.lzo.LzoCompressor.reinit(LzoCompressor.java:20
>> 7
>> > )
>> >        at
>> >
>> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:
>> 1
>> > 05)
>> >        at
>> >
>> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:
>> 1
>> > 12)
>> >        at
>> >
>> org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(
>> C
>> > ompression.java:198)
>> >        at
>> >
>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.getCompressingStream(HF
>> i
>> > le.java:391)
>> >        at
>> >
>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.newBlock(HFile.java:377
>> )
>> >        at
>> >
>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.checkBlockBoundary(HFil
>> e
>> > .java:348)
>> >        at
>> >
>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:530)
>> >        at
>> >
>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:495)
>> >        at
>> >
>> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFil
>> e
>> > .java:817)
>> >        at
>> >
>> org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:811)
>> >        at
>> >
>> org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:670)
>> >        at
>> >
>> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.ja
>> v
>> > a:722)
>> >        at
>> >
>> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.ja
>> v
>> > a:671)
>> >        at
>> >
>> org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(CompactSp
>> l
>> > itThread.java:84)
>> > 2010-12-15 20:01:19,586 INFO
>> > org.apache.hadoop.hbase.regionserver.HRegionServer:
>> Dump of metrics:
>> > request=0.0, regions=709, stores=709, storefiles=731,
>>
>> > storefileIndexSize=418, memstoreSize=33,
>> compactionQueueSize=15,
>> > usedHeap=856, maxHeap=2198, blockCacheSize=366779472,
>>
>> > blockCacheFree=87883088, blockCacheCount=5494,
>> blockCacheHitRatio=0
>> > 2010-12-15 20:01:20,571 INFO
>> org.apache.hadoop.ipc.HBaseServer:
>> > Stopping server on 60020
>> >
>> > Thanks,
>> >
>> > Sandy
>> >
>> >
>>
>
>
>
>



--
Todd Lipcon
Software Engineer, Cloudera

RE: Simple OOM crash?

Reply via email to