On Thu, Jan 13, 2011 at 12:25 AM, Friso van Vollenhoven < [email protected]> wrote:
> Hey Todd, > > I saw the patch. On what JVM (versions) have you tested this? > I tested on Sun JVM 1.6u22, but the undocumented calls I used have definitely been around for a long time, so it ought to work on any Sun or OpenJDK as far as I know. > > (Probably the wrong list for this, but: is there a officially supported JVM > version for CDH3?) > > We recommend the Sun 1.6 >=u16 but not u18 -Todd > > > On 13 jan 2011, at 07:42, Todd Lipcon wrote: > > > On Wed, Jan 12, 2011 at 5:01 PM, Tatsuya Kawano <[email protected] > >wrote: > > > >>> And > >>> in some circumstances (like all the rigged tests I've attempted to do) > >> these > >>> get cleaned up nicely by the JVM. It seems only in pretty large heaps > in > >>> real workloads does the leak actually end up running away. > >> > >> This issue should be circumstance dependent as we don't have direct > control > >> on deallocating those buffers. We need them GCed but they never occupy > the > >> Java heap to encourage the GC to run. > >> > > > > Thanks to reflection and use of undocumented APIs, you can actually > free() a > > direct buffer - check out the patch referenced earlier in this thread. > > > > Of course it probably doesn't work on other JVMs... oh well. > > > > -Todd > > > >> > >> > >> On Jan 13, 2011, at 8:30 AM, Todd Lipcon <[email protected]> wrote: > >> > >>> On Wed, Jan 12, 2011 at 3:25 PM, Tatsuya Kawano <[email protected] > >>> wrote: > >>> > >>>> Hi Friso and everyone, > >>>> > >>>> OK. We don't have to spend time to juggle hadoop-core jars anymore > since > >>>> Todd is working hard on enhancing hadoop-lzo behavior. > >>>> > >>>> I think your assumption is correct, but what I was trying to say was > >> HBase > >>>> doesn't change the way to use Hadoop compressors since HBase 0.20 > >> release > >>>> while Hadoop added reinit() on 0.21. I verified that ASF Hadoop 0.21 > and > >>>> CDH3b3 have reinit() and ASF Hadoop 0.20.2 (including its append > branch) > >> and > >>>> CDH3b2 don't. I saw you had no problem running HBase 0.89 on CDH3b2, > so > >> I > >>>> thought HBase 0.90 would work fine on ASF Hadoop 0.20.2. Because both > of > >>>> them don't have reinit(). > >>>> > >>>> > >>> Yep - but that jar isn't wire-compatible with a CDH3b3 cluster. So if > you > >>> have a CDH3b3 cluster for one of the other features included, you need > to > >>> use a 3b3 client jar as well, which includes the reinit stuff. > >>> > >>> > >>>> HBase tries to create an output compression stream on each compression > >>>> block, and one HFile flush will contain roughly 1000 compression > blocks. > >> I > >>>> think reinit() could get called 1000 times on one flush, and if > >> hadoop-lzo > >>>> allocates 64MB block on reinit() (HBase's compression blocks is about > >> 64KB > >>>> though), it will become pretty much something you're observing now. > >>>> > >>>> > >>> Yep - though I think it's only leaking a 64K buffer for each in 0.4.8. > >> And > >>> in some circumstances (like all the rigged tests I've attempted to do) > >> these > >>> get cleaned up nicely by the JVM. It seems only in pretty large heaps > in > >>> real workloads does the leak actually end up running away. > >>> > >>> -Todd > >>> > >>>> > >>>> On Jan 13, 2011, at 7:50 AM, Todd Lipcon <[email protected]> wrote: > >>>> > >>>>> Can someone who is having this issue try checking out the following > git > >>>>> branch and rebuilding LZO? > >>>>> > >>>>> https://github.com/toddlipcon/hadoop-lzo/tree/realloc > >>>>> > >>>>> This definitely stems one leak of a 64KB directbuffer on every > reinit. > >>>>> > >>>>> -Todd > >>>>> > >>>>> On Wed, Jan 12, 2011 at 2:12 PM, Todd Lipcon <[email protected]> > >> wrote: > >>>>> > >>>>>> Yea, you're definitely on the right track. Have you considered > systems > >>>>>> programming, Friso? :) > >>>>>> > >>>>>> Hopefully should have a candidate patch to LZO later today. > >>>>>> > >>>>>> -Todd > >>>>>> > >>>>>> On Wed, Jan 12, 2011 at 1:20 PM, Friso van Vollenhoven < > >>>>>> [email protected]> wrote: > >>>>>> > >>>>>>> Hi, > >>>>>>> My guess is indeed that it has to do with using the reinit() method > >> on > >>>>>>> compressors and making them long lived instead of throwaway > together > >>>> with > >>>>>>> the LZO implementation of reinit(), which magically causes NIO > buffer > >>>>>>> objects not to be finalized and as a result not release their > native > >>>>>>> allocations. It's just theory and I haven't had the time to > properly > >>>> verify > >>>>>>> this (unfortunately, I spend most of my time writing application > >> code), > >>>> but > >>>>>>> Todd said he will be looking into it further. I browsed the LZO > code > >> to > >>>> see > >>>>>>> what was going on there, but with my limited knowledge of the HBase > >>>> code it > >>>>>>> would be bald to say that this is for sure the case. It would be my > >>>> first > >>>>>>> direction of investigation. I would add some logging to the LZO > code > >>>> where > >>>>>>> new direct byte buffers are created to log how often that happens > and > >>>> what > >>>>>>> size they are and then redo the workload that shows the leak. > >> Together > >>>> with > >>>>>>> some profiling you should be able to see how long it takes for > these > >>>> get > >>>>>>> finalized. > >>>>>>> > >>>>>>> Cheers, > >>>>>>> Friso > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> On 12 jan 2011, at 20:08, Stack wrote: > >>>>>>> > >>>>>>>> 2011/1/12 Friso van Vollenhoven <[email protected]>: > >>>>>>>>> No, I haven't. But the Hadoop (mapreduce) LZO compression is not > >> the > >>>>>>> problem. Compressing the map output using LZO works just fine. The > >>>> problem > >>>>>>> is HBase LZO compression. The region server process is the one with > >> the > >>>>>>> memory leak... > >>>>>>>>> > >>>>>>>> > >>>>>>>> (Sorry for dumb question Friso) But HBase is leaking because we > make > >>>>>>>> use of the Compression API in a manner that produces leaks? > >>>>>>>> Thanks, > >>>>>>>> St.Ack > >>>>>>> > >>>>>>> > >>>>>> > >>>>>> > >>>>>> -- > >>>>>> Todd Lipcon > >>>>>> Software Engineer, Cloudera > >>>>>> > >>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> Todd Lipcon > >>>>> Software Engineer, Cloudera > >>>> > >>> > >>> > >>> > >>> -- > >>> Todd Lipcon > >>> Software Engineer, Cloudera > >> > >> > > > > > > -- > > Todd Lipcon > > Software Engineer, Cloudera > > -- Todd Lipcon Software Engineer, Cloudera
