Hi Todd, 

> Yep - but that jar isn't wire-compatible with a CDH3b3 cluster. So if you
> have a CDH3b3 cluster for one of the other features included, you need to
> use a 3b3 client jar as well, 

Yeah, I saw the number "+737" after the version number. Thanks for clarifying 
it. (and sorry for the bad suggestion.)


> And
> in some circumstances (like all the rigged tests I've attempted to do) these
> get cleaned up nicely by the JVM. It seems only in pretty large heaps in
> real workloads does the leak actually end up running away.

This issue should be circumstance dependent as we don't have direct control on 
deallocating those buffers. We need them GCed but they never occupy the Java 
heap to encourage the GC to run.

-Tatsuya


On Jan 13, 2011, at 8:30 AM, Todd Lipcon <[email protected]> wrote:

> On Wed, Jan 12, 2011 at 3:25 PM, Tatsuya Kawano <[email protected]>wrote:
> 
>> Hi Friso and everyone,
>> 
>> OK. We don't have to spend time to juggle hadoop-core jars anymore since
>> Todd is working hard on enhancing hadoop-lzo behavior.
>> 
>> I think your assumption is correct, but what I was trying to say was HBase
>> doesn't change the way to use Hadoop compressors since HBase 0.20 release
>> while Hadoop added reinit() on 0.21. I verified that ASF Hadoop 0.21 and
>> CDH3b3 have reinit() and ASF Hadoop 0.20.2 (including its append branch) and
>> CDH3b2 don't. I saw you had no problem running HBase 0.89 on CDH3b2, so I
>> thought HBase 0.90 would work fine on ASF Hadoop 0.20.2. Because both of
>> them don't have reinit().
>> 
>> 
> Yep - but that jar isn't wire-compatible with a CDH3b3 cluster. So if you
> have a CDH3b3 cluster for one of the other features included, you need to
> use a 3b3 client jar as well, which includes the reinit stuff.
> 
> 
>> HBase tries to create an output compression stream on each compression
>> block, and one HFile flush will contain roughly 1000 compression blocks. I
>> think reinit() could get called 1000 times on one flush, and if hadoop-lzo
>> allocates 64MB block on reinit() (HBase's compression blocks is about 64KB
>> though), it will become pretty much something you're observing now.
>> 
>> 
> Yep - though I think it's only leaking a 64K buffer for each in 0.4.8. And
> in some circumstances (like all the rigged tests I've attempted to do) these
> get cleaned up nicely by the JVM. It seems only in pretty large heaps in
> real workloads does the leak actually end up running away.
> 
> -Todd
> 
>> 
>> On Jan 13, 2011, at 7:50 AM, Todd Lipcon <[email protected]> wrote:
>> 
>>> Can someone who is having this issue try checking out the following git
>>> branch and rebuilding LZO?
>>> 
>>> https://github.com/toddlipcon/hadoop-lzo/tree/realloc
>>> 
>>> This definitely stems one leak of a 64KB directbuffer on every reinit.
>>> 
>>> -Todd
>>> 
>>> On Wed, Jan 12, 2011 at 2:12 PM, Todd Lipcon <[email protected]> wrote:
>>> 
>>>> Yea, you're definitely on the right track. Have you considered systems
>>>> programming, Friso? :)
>>>> 
>>>> Hopefully should have a candidate patch to LZO later today.
>>>> 
>>>> -Todd
>>>> 
>>>> On Wed, Jan 12, 2011 at 1:20 PM, Friso van Vollenhoven <
>>>> [email protected]> wrote:
>>>> 
>>>>> Hi,
>>>>> My guess is indeed that it has to do with using the reinit() method on
>>>>> compressors and making them long lived instead of throwaway together
>> with
>>>>> the LZO implementation of reinit(), which magically causes NIO buffer
>>>>> objects not to be finalized and as a result not release their native
>>>>> allocations. It's just theory and I haven't had the time to properly
>> verify
>>>>> this (unfortunately, I spend most of my time writing application code),
>> but
>>>>> Todd said he will be looking into it further. I browsed the LZO code to
>> see
>>>>> what was going on there, but with my limited knowledge of the HBase
>> code it
>>>>> would be bald to say that this is for sure the case. It would be my
>> first
>>>>> direction of investigation. I would add some logging to the LZO code
>> where
>>>>> new direct byte buffers are created to log how often that happens and
>> what
>>>>> size they are and then redo the workload that shows the leak. Together
>> with
>>>>> some profiling you should be able to see how long it takes for these
>> get
>>>>> finalized.
>>>>> 
>>>>> Cheers,
>>>>> Friso
>>>>> 
>>>>> 
>>>>> 
>>>>> On 12 jan 2011, at 20:08, Stack wrote:
>>>>> 
>>>>>> 2011/1/12 Friso van Vollenhoven <[email protected]>:
>>>>>>> No, I haven't. But the Hadoop (mapreduce) LZO compression is not the
>>>>> problem. Compressing the map output using LZO works just fine. The
>> problem
>>>>> is HBase LZO compression. The region server process is the one with the
>>>>> memory leak...
>>>>>>> 
>>>>>> 
>>>>>> (Sorry for dumb question Friso) But HBase is leaking because we make
>>>>>> use of the Compression API in a manner that produces leaks?
>>>>>> Thanks,
>>>>>> St.Ack
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> Todd Lipcon
>>>> Software Engineer, Cloudera
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Todd Lipcon
>>> Software Engineer, Cloudera
>> 
> 
> 
> 
> -- 
> Todd Lipcon
> Software Engineer, Cloudera

Reply via email to