Hi Friso and everyone, OK. We don't have to spend time to juggle hadoop-core jars anymore since Todd is working hard on enhancing hadoop-lzo behavior.
I think your assumption is correct, but what I was trying to say was HBase doesn't change the way to use Hadoop compressors since HBase 0.20 release while Hadoop added reinit() on 0.21. I verified that ASF Hadoop 0.21 and CDH3b3 have reinit() and ASF Hadoop 0.20.2 (including its append branch) and CDH3b2 don't. I saw you had no problem running HBase 0.89 on CDH3b2, so I thought HBase 0.90 would work fine on ASF Hadoop 0.20.2. Because both of them don't have reinit(). HBase tries to create an output compression stream on each compression block, and one HFile flush will contain roughly 1000 compression blocks. I think reinit() could get called 1000 times on one flush, and if hadoop-lzo allocates 64MB block on reinit() (HBase's compression blocks is about 64KB though), it will become pretty much something you're observing now. Thanks, -- Tatsuya Kawano (Mr.) Tokyo, Japan On Jan 13, 2011, at 7:50 AM, Todd Lipcon <[email protected]> wrote: > Can someone who is having this issue try checking out the following git > branch and rebuilding LZO? > > https://github.com/toddlipcon/hadoop-lzo/tree/realloc > > This definitely stems one leak of a 64KB directbuffer on every reinit. > > -Todd > > On Wed, Jan 12, 2011 at 2:12 PM, Todd Lipcon <[email protected]> wrote: > >> Yea, you're definitely on the right track. Have you considered systems >> programming, Friso? :) >> >> Hopefully should have a candidate patch to LZO later today. >> >> -Todd >> >> On Wed, Jan 12, 2011 at 1:20 PM, Friso van Vollenhoven < >> [email protected]> wrote: >> >>> Hi, >>> My guess is indeed that it has to do with using the reinit() method on >>> compressors and making them long lived instead of throwaway together with >>> the LZO implementation of reinit(), which magically causes NIO buffer >>> objects not to be finalized and as a result not release their native >>> allocations. It's just theory and I haven't had the time to properly verify >>> this (unfortunately, I spend most of my time writing application code), but >>> Todd said he will be looking into it further. I browsed the LZO code to see >>> what was going on there, but with my limited knowledge of the HBase code it >>> would be bald to say that this is for sure the case. It would be my first >>> direction of investigation. I would add some logging to the LZO code where >>> new direct byte buffers are created to log how often that happens and what >>> size they are and then redo the workload that shows the leak. Together with >>> some profiling you should be able to see how long it takes for these get >>> finalized. >>> >>> Cheers, >>> Friso >>> >>> >>> >>> On 12 jan 2011, at 20:08, Stack wrote: >>> >>>> 2011/1/12 Friso van Vollenhoven <[email protected]>: >>>>> No, I haven't. But the Hadoop (mapreduce) LZO compression is not the >>> problem. Compressing the map output using LZO works just fine. The problem >>> is HBase LZO compression. The region server process is the one with the >>> memory leak... >>>>> >>>> >>>> (Sorry for dumb question Friso) But HBase is leaking because we make >>>> use of the Compression API in a manner that produces leaks? >>>> Thanks, >>>> St.Ack >>> >>> >> >> >> -- >> Todd Lipcon >> Software Engineer, Cloudera >> > > > > -- > Todd Lipcon > Software Engineer, Cloudera
