Thanks Sandy. Does setting -XX:MaxDirectMemorySize help in triggering GC when you're reaching that limit? Or does it just OOME before the actual RAM is exhausted (then you prevent swapping, which is nicer, though)?
I guess LZO is not a solution that fits all, but we do a lot of random reads and latency can be an issue for us, so I suppose we have to stick with it. Friso On 5 jan 2011, at 20:36, Sandy Pratt wrote: > I was in a similar situation recently, with similar symptoms, and I > experienced a crash very similar to yours. I don't have the specifics handy > at the moment, but I did post to this list about it a few weeks ago. My > workload is fairly write-heavy. I write about 10-20 million smallish > protobuf/xml blobs per day to an HBase cluster of 12 very underpowered > machines. > > The suggestions I received were two: 1) update to the latest hadoop-lzo and > 2) specify a max direct memory size to the JVM (e.g. > -XX:MaxDirectMemorySize=256m). > > I took a third route - change my tables back to gz compression for the time > being while I figure out what to do. Since then, my memory usage has been > rock steady, but more importantly my tables are roughly half the size on disk > that they were with LZO, and there has been no noticeable drop in performance > (but remember this is a write heavy workload, I'm not trying to serve an > online workload with low latency or anything like that). At this point, I > might not return to LZO. > > In general, I'm not convinced that "use LZO" is universally good advice for > all HBase users. For one thing, I think it assumes that all installations > are focused towards low latency, which is not always the case (sometimes > merely good latency is enough and great latency is not needed). Secondly, it > assumes some things about where the performance bottleneck lives. For > example, LZO performs well in micro-benchmarks, but if you find yourself in > an IO-bound batch processing situation, you might be better served by a > higher compression ratio, even if it's more computationally expensive. > > Sandy > >> -----Original Message----- >> From: Friso van Vollenhoven [mailto:[email protected]] >> Sent: Tuesday, January 04, 2011 08:00 >> To: <[email protected]> >> Subject: Re: problem with LZO compressor on write only loads >> >> I ran the job again, but with less other processes running on the same >> machine, so with more physical memory available to HBase. This was to see >> whether there was a point where it would stop allocating more buffers. >> When I do this, after many hours, one of the RSes crashed with a OOME. See >> here: >> >> 2011-01-04 11:32:01,332 FATAL >> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region >> server serverName=w5r1.inrdb.ripe.net,60020,1294091507228, >> load=(requests=6246, regions=258, usedHeap=1790, maxHeap=16000): >> Uncaught exception in service thread regionserver60020.compactor >> java.lang.OutOfMemoryError: Direct buffer memory >> at java.nio.Bits.reserveMemory(Bits.java:633) >> at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:98) >> at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288) >> at >> com.hadoop.compression.lzo.LzoCompressor.init(LzoCompressor.java:248) >> at >> com.hadoop.compression.lzo.LzoCompressor.reinit(LzoCompressor.java:207 >> ) >> at >> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java: >> 105) >> at >> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java: >> 112) >> at >> org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(C >> ompression.java:200) >> at >> org.apache.hadoop.hbase.io.hfile.HFile$Writer.getCompressingStream(HFile >> .java:397) >> at >> org.apache.hadoop.hbase.io.hfile.HFile$Writer.newBlock(HFile.java:383) >> at >> org.apache.hadoop.hbase.io.hfile.HFile$Writer.checkBlockBoundary(HFile.ja >> va:354) >> at >> org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:536) >> at >> org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:501) >> at >> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.j >> ava:836) >> at >> org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:931) >> at >> org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:732) >> at >> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.jav >> a:764) >> at >> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.jav >> a:709) >> at >> org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(CompactSp >> litThread.java:81) >> 2011-01-04 11:32:01,369 INFO >> org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics: >> request=0.0, regions=258, stores=516, storefiles=186, >> storefileIndexSize=179, memstoreSize=2125, compactionQueueSize=2, >> usedHeap=1797, maxHeap=16000, blockCacheSize=55051488, >> blockCacheFree=6655834912, blockCacheCount=0, blockCacheHitCount=0, >> blockCacheMissCount=2397107, blockCacheEvictedCount=0, >> blockCacheHitRatio=0, blockCacheHitCachingRatio=0 >> >> I am guessing the OS won't allocate any more memory to the process. As you >> can see, the used heap is nowhere near the max heap. >> >> Also, this happens from the compaction, it seems. I had not considered those >> as a suspect yet. I could try running with a larger compaction threshold and >> blocking store files. Since this is a write only load, it should not be a >> problem. >> In our normal operation, compactions and splits are quite common, though, >> because we do read-modify-write cycles a lot. Anyone else doing update >> heavy work with LZO? >> >> >> Cheers, >> Friso >> >> >> On 4 jan 2011, at 01:54, Todd Lipcon wrote: >> >>> Fishy. Are your cells particularly large? Or have you tuned the HFile >>> block size at all? >>> >>> -Todd >>> >>> On Mon, Jan 3, 2011 at 2:15 PM, Friso van Vollenhoven < >>> [email protected]> wrote: >>> >>>> I tried it, but it doesn't seem to help. The RS processes grow to >>>> 30Gb in minutes after the job started. >>>> >>>> Any ideas? >>>> >>>> >>>> Friso >>>> >>>> >>>> >>>> On 3 jan 2011, at 19:18, Todd Lipcon wrote: >>>> >>>>> Hi Friso, >>>>> >>>>> Which OS are you running? Particularly, which version of glibc? >>>>> >>>>> Can you try running with the environment variable >> MALLOC_ARENA_MAX=1 set? >>>>> >>>>> Thanks >>>>> -Todd >>>>> >>>>> On Mon, Jan 3, 2011 at 8:15 AM, Friso van Vollenhoven < >>>>> [email protected]> wrote: >>>>> >>>>>> Hi all, >>>>>> >>>>>> I seem to run into a problem that occurs when using LZO compression >>>>>> on a heavy write only load. I am using 0.90 RC1 and, thus, the LZO >>>>>> compressor code that supports the reinit() method (from Kevin >>>>>> Weil's github, >>>> version >>>>>> 0.4.8). There are some more Hadoop LZO incarnations, so I am >>>>>> pointing my question to this list. >>>>>> >>>>>> It looks like the compressor uses direct byte buffers to store the >>>> original >>>>>> and compressed bytes in memory, so the native code can work with it >>>> without >>>>>> the JVM having to copy anything around. The direct buffers are >>>>>> possibly reused after a reinit() call, but will often be newly >>>>>> created in the >>>> init() >>>>>> method, because the existing buffer can be the wrong size for reusing. >>>> The >>>>>> latter case will leave the previously used buffers by the >>>>>> compressor instance eligible for garbage collection. I think the >>>>>> problem is that >>>> this >>>>>> collection never occurs (in time), because the GC does not consider >>>>>> it necessary yet. The GC does not know about the native heap and >>>>>> based on >>>> the >>>>>> state of the JVM heap, there is no reason to finalize these objects yet. >>>>>> However, direct byte buffers are only freed in the finalizer, so >>>>>> the >>>> native >>>>>> heap keeps growing. On write only loads, a full GC will rarely >>>>>> happen, because the max heap will not grow far beyond the mem >>>>>> stores (no block >>>> cache >>>>>> is used). So what happens is that the machine starts using swap >>>>>> before >>>> the >>>>>> GC will ever clean up the direct byte buffers. I am guessing that >>>> without >>>>>> the reinit() support, the buffers were collected earlier because >>>>>> the referring objects would also be collected every now and then or >>>>>> things >>>> would >>>>>> perhaps just never promote to an older generation. >>>>>> >>>>>> When I do a pmap on a running RS after it has grown to some 40Gb >>>> resident >>>>>> size (with a 16Gb heap), it will show a lot of near 64M anon blocks >>>>>> (presumably native heap). I show this before with the 0.4.6 version >>>>>> of Hadoop LZO, but that was under normal load. After that I went >>>>>> back to a HBase version that does not require the reinit(). Now I >>>>>> am on 0.90 with >>>> the >>>>>> new LZO, but never did a heavy load like this one with that, until >>>> now... >>>>>> >>>>>> Can anyone with a better understanding of the LZO code confirm that >>>>>> the above could be the case? If so, would it be possible to change >>>>>> the LZO compressor (and decompressor) to use maybe just one fixed >>>>>> size buffer >>>> (they >>>>>> all appear near 64M anyway) or possibly reuse an existing buffer >>>>>> also >>>> when >>>>>> it is not the exact required size but just large enough to make do? >>>> Having >>>>>> short lived direct byte buffers is apparently a discouraged >>>>>> practice. If anyone can provide some pointers on what to look out >>>>>> for, I could invest some time in creating a patch. >>>>>> >>>>>> >>>>>> Thanks, >>>>>> Friso >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Todd Lipcon >>>>> Software Engineer, Cloudera >>>> >>>> >>> >>> >>> -- >>> Todd Lipcon >>> Software Engineer, Cloudera >
