On Thu, Jan 13, 2011 at 12:25 AM, Friso van Vollenhoven <
[email protected]> wrote:

> Hey Todd,
>
> I saw the patch. On what JVM (versions) have you tested this?
>

I tested on Sun JVM 1.6u22, but the undocumented calls I used have
definitely been around for a long time, so it ought to work on any Sun or
OpenJDK as far as I know.


>
> (Probably the wrong list for this, but: is there a officially supported JVM
> version for CDH3?)
>
>
We recommend the Sun 1.6 >=u16 but not u18

-Todd

>
>
> On 13 jan 2011, at 07:42, Todd Lipcon wrote:
>
> > On Wed, Jan 12, 2011 at 5:01 PM, Tatsuya Kawano <[email protected]
> >wrote:
> >
> >>> And
> >>> in some circumstances (like all the rigged tests I've attempted to do)
> >> these
> >>> get cleaned up nicely by the JVM. It seems only in pretty large heaps
> in
> >>> real workloads does the leak actually end up running away.
> >>
> >> This issue should be circumstance dependent as we don't have direct
> control
> >> on deallocating those buffers. We need them GCed but they never occupy
> the
> >> Java heap to encourage the GC to run.
> >>
> >
> > Thanks to reflection and use of undocumented APIs, you can actually
> free() a
> > direct buffer - check out the patch referenced earlier in this thread.
> >
> > Of course it probably doesn't work on other JVMs... oh well.
> >
> > -Todd
> >
> >>
> >>
> >> On Jan 13, 2011, at 8:30 AM, Todd Lipcon <[email protected]> wrote:
> >>
> >>> On Wed, Jan 12, 2011 at 3:25 PM, Tatsuya Kawano <[email protected]
> >>> wrote:
> >>>
> >>>> Hi Friso and everyone,
> >>>>
> >>>> OK. We don't have to spend time to juggle hadoop-core jars anymore
> since
> >>>> Todd is working hard on enhancing hadoop-lzo behavior.
> >>>>
> >>>> I think your assumption is correct, but what I was trying to say was
> >> HBase
> >>>> doesn't change the way to use Hadoop compressors since HBase 0.20
> >> release
> >>>> while Hadoop added reinit() on 0.21. I verified that ASF Hadoop 0.21
> and
> >>>> CDH3b3 have reinit() and ASF Hadoop 0.20.2 (including its append
> branch)
> >> and
> >>>> CDH3b2 don't. I saw you had no problem running HBase 0.89 on CDH3b2,
> so
> >> I
> >>>> thought HBase 0.90 would work fine on ASF Hadoop 0.20.2. Because both
> of
> >>>> them don't have reinit().
> >>>>
> >>>>
> >>> Yep - but that jar isn't wire-compatible with a CDH3b3 cluster. So if
> you
> >>> have a CDH3b3 cluster for one of the other features included, you need
> to
> >>> use a 3b3 client jar as well, which includes the reinit stuff.
> >>>
> >>>
> >>>> HBase tries to create an output compression stream on each compression
> >>>> block, and one HFile flush will contain roughly 1000 compression
> blocks.
> >> I
> >>>> think reinit() could get called 1000 times on one flush, and if
> >> hadoop-lzo
> >>>> allocates 64MB block on reinit() (HBase's compression blocks is about
> >> 64KB
> >>>> though), it will become pretty much something you're observing now.
> >>>>
> >>>>
> >>> Yep - though I think it's only leaking a 64K buffer for each in 0.4.8.
> >> And
> >>> in some circumstances (like all the rigged tests I've attempted to do)
> >> these
> >>> get cleaned up nicely by the JVM. It seems only in pretty large heaps
> in
> >>> real workloads does the leak actually end up running away.
> >>>
> >>> -Todd
> >>>
> >>>>
> >>>> On Jan 13, 2011, at 7:50 AM, Todd Lipcon <[email protected]> wrote:
> >>>>
> >>>>> Can someone who is having this issue try checking out the following
> git
> >>>>> branch and rebuilding LZO?
> >>>>>
> >>>>> https://github.com/toddlipcon/hadoop-lzo/tree/realloc
> >>>>>
> >>>>> This definitely stems one leak of a 64KB directbuffer on every
> reinit.
> >>>>>
> >>>>> -Todd
> >>>>>
> >>>>> On Wed, Jan 12, 2011 at 2:12 PM, Todd Lipcon <[email protected]>
> >> wrote:
> >>>>>
> >>>>>> Yea, you're definitely on the right track. Have you considered
> systems
> >>>>>> programming, Friso? :)
> >>>>>>
> >>>>>> Hopefully should have a candidate patch to LZO later today.
> >>>>>>
> >>>>>> -Todd
> >>>>>>
> >>>>>> On Wed, Jan 12, 2011 at 1:20 PM, Friso van Vollenhoven <
> >>>>>> [email protected]> wrote:
> >>>>>>
> >>>>>>> Hi,
> >>>>>>> My guess is indeed that it has to do with using the reinit() method
> >> on
> >>>>>>> compressors and making them long lived instead of throwaway
> together
> >>>> with
> >>>>>>> the LZO implementation of reinit(), which magically causes NIO
> buffer
> >>>>>>> objects not to be finalized and as a result not release their
> native
> >>>>>>> allocations. It's just theory and I haven't had the time to
> properly
> >>>> verify
> >>>>>>> this (unfortunately, I spend most of my time writing application
> >> code),
> >>>> but
> >>>>>>> Todd said he will be looking into it further. I browsed the LZO
> code
> >> to
> >>>> see
> >>>>>>> what was going on there, but with my limited knowledge of the HBase
> >>>> code it
> >>>>>>> would be bald to say that this is for sure the case. It would be my
> >>>> first
> >>>>>>> direction of investigation. I would add some logging to the LZO
> code
> >>>> where
> >>>>>>> new direct byte buffers are created to log how often that happens
> and
> >>>> what
> >>>>>>> size they are and then redo the workload that shows the leak.
> >> Together
> >>>> with
> >>>>>>> some profiling you should be able to see how long it takes for
> these
> >>>> get
> >>>>>>> finalized.
> >>>>>>>
> >>>>>>> Cheers,
> >>>>>>> Friso
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On 12 jan 2011, at 20:08, Stack wrote:
> >>>>>>>
> >>>>>>>> 2011/1/12 Friso van Vollenhoven <[email protected]>:
> >>>>>>>>> No, I haven't. But the Hadoop (mapreduce) LZO compression is not
> >> the
> >>>>>>> problem. Compressing the map output using LZO works just fine. The
> >>>> problem
> >>>>>>> is HBase LZO compression. The region server process is the one with
> >> the
> >>>>>>> memory leak...
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> (Sorry for dumb question Friso) But HBase is leaking because we
> make
> >>>>>>>> use of the Compression API in a manner that produces leaks?
> >>>>>>>> Thanks,
> >>>>>>>> St.Ack
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Todd Lipcon
> >>>>>> Software Engineer, Cloudera
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Todd Lipcon
> >>>>> Software Engineer, Cloudera
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Todd Lipcon
> >>> Software Engineer, Cloudera
> >>
> >>
> >
> >
> > --
> > Todd Lipcon
> > Software Engineer, Cloudera
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Reply via email to