On Fri, Dec 17, 2010 at 2:32 PM, Sandy Pratt <[email protected]> wrote: > Todd, > > While we're on the subject, and since you seem to know LZO well, can you > answer a few questions that have been playing around in my mind lately? > > 1) Does GZ also use the Direct Memory Buffer like LZO does?
I don't know much about the gzip codec, but I believe so long as you're using the native one (ie have the hadoop native libraries installed) it is very similar, yes. > > 2) What size to you run with for that buffer? I kicked it up to 512m the > other day and I haven't seen problems but I wonder if that's overkill. Which buffer are you referring to? I don't do any particular tuning for the LZO codec. I do usually set io.file.buffer.size to 128KB in Hadoop, but that's at a different layer. > > 3) How do you think LZO memory use compares to GZ? The reason I ask is > because ISTR reading that GZ is very light on memory. If it's significantly > lighter than LZO, it might be worth my while to use GZ instead, even though > it's slower than LZO, and use the freed memory to allocate another map slot. > All the LZO buffers are pooled and pretty transient so long as there isn't a leak (like the bug you hit). Without a leak it should be responsible for <1M of memory usage, in my experience. Thanks -Todd > > -----Original Message----- > From: Sandy Pratt [mailto:[email protected]] > Sent: Friday, December 17, 2010 14:04 > To: [email protected] > Subject: RE: Simple OOM crash? > > That worked. Thanks! > > -----Original Message----- > From: Todd Lipcon [mailto:[email protected]] > Sent: Friday, December 17, 2010 13:54 > To: [email protected] > Subject: Re: Simple OOM crash? > > Hi Sandy, > > I've seen that error on github as well. Try using the git:// URL instead of > the http:// URL. The http transport in git is a bit buggy. > > Worst case there's also an option to download a tarball there. > > -Todd > > On Fri, Dec 17, 2010 at 10:59 AM, Sandy Pratt <[email protected]> wrote: >> Thanks all for your help. >> >> I set about to update the hadoop-lzo jar using Todd Lipcon's git repo >> (https://github.com/toddlipcon/hadoop-lzo), and I encountered an error. I'm >> not a git user, so I could be doing something wrong, but I'm not sure what. >> Has something changed with this repo in the last month or two? >> >> The error is pasted below: >> >> [had...@ets-lax-prod-hadoop-01 hadoop-lzo]$ git pull walk >> 7cbf6e85ad992faac880ef54a78ce926b6c02bda >> walk fdbddcafd8276497d0181d40d72756336d204374 >> Getting alternates list for >> http://github.com/toddlipcon/hadoop-lzo.git >> Also look at http://github.com/network/312869.git/ >> error: The requested URL returned error: 502 (curl_result = 22, >> http_code = 502, sha1 = 552b3f9cc1c7fd08bedfe029cf76a08e42302ae4) >> Getting pack list for http://github.com/toddlipcon/hadoop-lzo.git >> Getting pack list for http://github.com/network/312869.git/ >> error: The requested URL returned error: 502 >> error: Unable to find 552b3f9cc1c7fd08bedfe029cf76a08e42302ae4 under >> http://github.com/toddlipcon/hadoop-lzo.git >> Cannot obtain needed commit 552b3f9cc1c7fd08bedfe029cf76a08e42302ae4 >> while processing commit fdbddcafd8276497d0181d40d72756336d204374. >> fatal: Fetch failed. >> >> >> Thanks, >> >> Sandy >> >> >> -----Original Message----- >> From: Andrew Purtell [mailto:[email protected]] >> Sent: Thursday, December 16, 2010 17:22 >> To: [email protected] >> Cc: Cosmin Lehene >> Subject: RE: Simple OOM crash? >> >> Use hadoop-lzo-0.4.7 or higher from >> https://github.com/toddlipcon/hadoop-lzo >> >> >> Best regards, >> >> - Andy >> >> >> --- On Thu, 12/16/10, Sandy Pratt <[email protected]> wrote: >> >>> From: Sandy Pratt <[email protected]> >>> Subject: RE: Simple OOM crash? >>> To: "[email protected]" <[email protected]> >>> Cc: "Cosmin Lehene" <[email protected]> >>> Date: Thursday, December 16, 2010, 4:00 PM >>> >>> The LZO jar installed is: >>> >>> hadoop-lzo-0.4.6.jar >>> >>> The native LZO libs are from EPEL (I think) installed on Centos 5.5 >>> 64 >>> bit: >>> >>> [had...@ets-lax-prod-hadoop-02 Linux-amd64-64]$ yum info lzo-devel >>> Name : lzo-devel Arch : x86_64 Version : 2.02 Release >>> : 2.el5.1 Size : 144 k Repo : installed Summary : >>> Development files for the lzo library URL : >>> http://www.oberhumer.com/opensource/lzo/ >>> License : GPL >>> Description: LZO is a portable lossless data compression library >>> written in ANSI C. >>> : It offers >>> pretty fast compression and very fast decompression. >>> : This >>> package contains development files needed for lzo. >>> >>> Is the direct buffer used only with LZO, or is it always involved >>> with HBase read/writes? >>> >>> Thanks for the help, >>> Sandy >>> >>> >>> -----Original Message----- >>> From: Ryan Rawson [mailto:[email protected]] >>> >>> Sent: Thursday, December 16, 2010 15:50 >>> To: [email protected] >>> Cc: Cosmin Lehene >>> Subject: Re: Simple OOM crash? >>> >>> What LZO version are you using? You aren't running out of regular >>> heap, you are running out of "Direct buffer memory" which is capped >>> to prevent mishaps. There is a flag to increase that size: >>> >>> -XX:MaxDirectMemorySize=100m >>> >>> etc >>> >>> enjoy, >>> -ryan >>> >>> On Thu, Dec 16, 2010 at 3:07 PM, Sandy Pratt <[email protected]> >>> wrote: >>> > Hello HBasers, >>> > >>> > I had a regionserver crash recently, and in perusing >>> the logs it looks like it simply had a bit too little memory. I'm >>> running with 2200 MB heap on reach regionserver. I plan to shave a >>> bit off the child VM allowance in favor of the regionserver to >>> correct this, probably bringing it up to 2500 MB. My question is if >>> there is any more specific memory allocation I should make rather >>> than simply giving more to the RS. I wonder about this because of the >>> following: >>> > >>> > load=(requests=0, regions=709, usedHeap=1349, >>> maxHeap=2198) >>> > >>> > which suggests to me that there was heap available, >>> but the RS couldn't use it for some reason. >>> > >>> > Conjecture: I do run with LZO compression, so I wonder >>> if I could be hitting that memory leak referenced earlier on the list. >>> I know there's a new version of the LZO library available that I >>> should upgrade to, but is it also possible to simply alter the table >>> to gzip compression and do a major compaction, then uninstall LZO >>> once that completes? >>> > >>> > Log follows: >>> > >>> > 2010-12-15 20:01:05,239 INFO >>> > org.apache.hadoop.hbase.regionserver.HRegion: Starting >>> compaction on >>> > region >>> ets.events,36345112f5654a29b308014f89c108e6,12798158203 >>> > 11.1063152548 >>> > 2010-12-15 20:01:05,239 DEBUG >>> > org.apache.hadoop.hbase.regionserver.Store: Major >>> compaction triggered >>> > on store f1; time since last major compaction >>> 119928149ms >>> > 2010-12-15 20:01:05,240 INFO >>> > org.apache.hadoop.hbase.regionserver.Store: Started >>> compaction of 2 >>> > file(s) in f1 of >>> ets.events,36345112f5654a29b308014f89c108e6,12 >>> > 79815820311.1063152548 into >>> > >>> hdfs://ets-lax-prod-hadoop-01.corp.adobe.com:54310/hbase/ets.events/1 >>> 0 >>> > 63152548/.tmp, sequenceid=25718885315 >>> > 2010-12-15 20:01:19,403 WARN >>> > org.apache.hadoop.hbase.regionserver.Store: Not in >>> > >>> setorg.apache.hadoop.hbase.regionserver.storescan...@7466c84 >>> > 2010-12-15 20:01:19,572 FATAL >>> > org.apache.hadoop.hbase.regionserver.HRegionServer: >>> Aborting region >>> > server >>> serverName=ets-lax-prod-hadoop-02.corp.adobe.com,60020, >>> > 1289682554219, load=(requests=0, regions=709, >>> usedHeap=1349, >>> > maxHeap=2198): Uncaught exception in service thread >>> > regionserver60020.compactor >>> > java.lang.OutOfMemoryError: Direct buffer memory >>> > at >>> java.nio.Bits.reserveMemory(Bits.java:656) >>> > at >>> java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:113) >>> > at >>> java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:305) >>> > at >>> > >>> com.hadoop.compression.lzo.LzoCompressor.init(LzoCompressor.java:223) >>> > at >>> > >>> com.hadoop.compression.lzo.LzoCompressor.reinit(LzoCompressor.java:20 >>> 7 >>> > ) >>> > at >>> > >>> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java: >>> 1 >>> > 05) >>> > at >>> > >>> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java: >>> 1 >>> > 12) >>> > at >>> > >>> org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor( >>> C >>> > ompression.java:198) >>> > at >>> > >>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.getCompressingStream(HF >>> i >>> > le.java:391) >>> > at >>> > >>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.newBlock(HFile.java:377 >>> ) >>> > at >>> > >>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.checkBlockBoundary(HFil >>> e >>> > .java:348) >>> > at >>> > >>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:530) >>> > at >>> > >>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:495) >>> > at >>> > >>> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFil >>> e >>> > .java:817) >>> > at >>> > >>> org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:811) >>> > at >>> > >>> org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:670) >>> > at >>> > >>> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.ja >>> v >>> > a:722) >>> > at >>> > >>> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.ja >>> v >>> > a:671) >>> > at >>> > >>> org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(CompactSp >>> l >>> > itThread.java:84) >>> > 2010-12-15 20:01:19,586 INFO >>> > org.apache.hadoop.hbase.regionserver.HRegionServer: >>> Dump of metrics: >>> > request=0.0, regions=709, stores=709, storefiles=731, >>> >>> > storefileIndexSize=418, memstoreSize=33, >>> compactionQueueSize=15, >>> > usedHeap=856, maxHeap=2198, blockCacheSize=366779472, >>> >>> > blockCacheFree=87883088, blockCacheCount=5494, >>> blockCacheHitRatio=0 >>> > 2010-12-15 20:01:20,571 INFO >>> org.apache.hadoop.ipc.HBaseServer: >>> > Stopping server on 60020 >>> > >>> > Thanks, >>> > >>> > Sandy >>> > >>> > >>> >> >> >> >> > > > > -- > Todd Lipcon > Software Engineer, Cloudera > -- Todd Lipcon Software Engineer, Cloudera
