Re: Memory leak in LZO native code?

Friso van Vollenhoven Tue, 16 Nov 2010 11:56:26 -0800

It's the version at https://github.com/toddlipcon/hadoop-lzo that got upped to 
0.4.7 because of this fix (download and build yourself). You need this version 
to go with CDH3b3. I don't know how this relates to the ASF release / trunk of 
HBase. This version is a fork from https://github.com/kevinweil/hadoop-lzo, 
which is what I used before (on ASF HBase and CDH3b2).


Friso


<https://github.com/toddlipcon/hadoop-lzo>
On 16 nov 2010, at 19:47, Sean Bigdatafun wrote:

Hi Todd,

Can you please give the URL of this fix?

Thanks,
Sean

On Sat, Nov 13, 2010 at 9:10 PM, Todd Lipcon 
<[email protected]<mailto:[email protected]>> wrote:

Hi Friso,

I think I identified the issue. As you suspected, we were unnecessarily
allocating a lot of native byte buffers in the LZO code where we weren't
before.

I just pushed a fix to my LZO repository and bumped the version number to
0.4.7.

If you have a chance to test this on a dev environment that would be great.
I will try to test myself this week. (unfortunately I wasn't able to
reproduce the issue yet)

Thanks
-Todd

On Fri, Nov 12, 2010 at 4:09 PM, Todd Lipcon 
<[email protected]<mailto:[email protected]>> wrote:

Hey Friso,

Thanks so much for the details. I am starting to imagine it could indeed
be
a codec leak - especially since you have some cells which are into the
MB,
maybe it's expanding some buffers to 64MB.

Let me try to do some tests to reproduce it here in the next week or so.

Anyone else seen this issue?

Thanks
-Todd

On Fri, Nov 12, 2010 at 1:19 AM, Friso van Vollenhoven <
[email protected]<mailto:[email protected]>> wrote:

Hi Todd,

I am afraid I no longer have the broken setup around, because we really
need a working one right now. We need to demo at a conference next week
and
until after that, all changes are frozen both on dev and prod (so we can
use
dev as fall back). Later on I could maybe try some more things on our
dev
boxes.

If you are doing a repro, here's the stuff you'd probably want to know:
The workload is write only. No reads happening at the same time. No
other
active clients. It is an initial import of data. We do insertions in a
MR
job from the reducers. The total volume is about 11 billion puts across
roughly 450K rows per table (we have a many columns per row data model)
across 15 tables, all use LZO. Qualifiers are some 50 bytes. Values
range
from a small number of KBs generally to MBs in rare cases. The row keys
have
a time-related part at the start, so I know the keyspace in advance, so
I
create the empty tables with pre-created regions (40 regions) across the
keyspace to get decent distribution from the start of the job. In order
to
not overload HBase, I run the job with only 15 reducers, so there are
max 15
concurrent clients active. Other settnigs: max file size is 1GB, HFile
block
size is default 64K, client side buffer is 16M, memstore flush size is
128M,
compaction threshold is 5, blocking store files is 9, mem store upper
limit
is 20%, lower limit 15%, block cache 40%. During the run, the RSes never
report more than 5GB of heap usage from the UI, which makes sense,
because
block cache is not touched. On a healthy run with somewhat conservative
settings right now, HBase reports on average about 380K requests per
second
in the master UI.

The cluster has 8 workers running TT, DN, RS and another JVM process for
our own software that sits in front of HBase. Workers are dual quad
cores
with 64GB RAM and 10x 600GB disks (we decided to scale the amount of
seeks
we can do concurrently). Disks are quite fast: 10K RPM. MR task VMs get
1GB
of heap, TT and DN also. RS gets 16GB of heap and our own software too.
We
run 8 mappers and 4 reducers per node. So at absolute max, we should
have
46GB of allocated heap. That leaves 18GB for JVM overhead, native
allocations and OS. We run Linux 2.6.18-194.11.4.el5. I think it is
CentOS,
but I didn't do the installs myself.

I tried numerous different settings both more extreme and more
conservative to get the thing working, but in the end it always ends up
swapping. I should have tried a run without LZO, of course, but I was
out of
time by then.



Cheers,
Friso



On 12 nov 2010, at 07:06, Todd Lipcon wrote:

Hrm, any chance you can run with a smaller heap and get a jmap dump?
The
eclipse MAT tool is also super nice for looking at this stuff if
indeed
they
are java objects.

What kind of workload are you using? Read mostly? Write mostly? Mixed?
I
will try to repro.

-Todd

On Thu, Nov 11, 2010 at 8:41 PM, Friso van Vollenhoven <
[email protected]<mailto:[email protected]>> wrote:

I figured the same. I also did a run with CMS instead of G1. Same
results.

I also did a run with the RS heap tuned down to 12GB and 8GB, but
given
enough time the process still grows over 40GB in size.


Friso



On 12 nov 2010, at 01:55, Todd Lipcon wrote:

Can you try running this with CMS GC instead of G1GC? G1 still has
some
bugs... 64M sounds like it might be G1 "regions"?

-Todd

On Thu, Nov 11, 2010 at 2:07 AM, Friso van Vollenhoven <
[email protected]<mailto:[email protected]>> wrote:

Hi All,

(This is all about CDH3, so I am not sure whether it should go on
this
list, but I figure it is at least interesting for people trying the
same.)

I've recently tried CDH3 on a new cluster from RPMs with the
hadoop-lzo
fork from https://github.com/toddlipcon/hadoop-lzo. Everything
works
like
a charm initially, but after some time (minutes to max one hour),
the
RS
JVM
process memory grows to more than twice the given heap size and
beyond.
I
have seen a RS with 16GB heap that grows to 55GB virtual size. At
some
point, everything start swapping and GC times go into the minutes
and
everything dies or is considered dead by the master.

I did a pmap -x on the RS process and that shows a lot of allocated
blocks
of about 64M by the process. There about 500 of these, which is
32GB
in
total. See: http://pastebin.com/8pgzPf7b (bottom of the file, the
blocks
of about 1M on top are probably thread stacks). Unfortunately,
Linux
shows
the native heap as anon blocks, so I can not link it to a specific
lib
or
something.

I am running the latest CDH3 and hadoop-lzo 0.4.6 (from said URL,
the
one
which has the reinit() support). I run Java 6u21 with the G1
garbage
collector, which has been running fine for some weeks now. Full
command
line
is:
java -Xmx16000m -XX:+HeapDumpOnOutOfMemoryError
-XX:+UnlockExperimentalVMOptions -XX:+UseG1GC
-XX:+UseCompressedOops
-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps
-Xloggc:/export/logs/hbase/gc-hbase.log
-Djava.library.path=/home/inr/java-lib/hbase/native/Linux-amd64-64
-Djava.net.preferIPv4Stack=true -Dhbase.log.dir=/export/logs/hbase
-Dhbase.log.file=hbase-hbase-regionserver-w3r1.inrdb.ripe.net.log
-Dhbase.home.dir=/usr/lib/hbase/bin/.. -Dhbase.id.str=hbase
-Dhbase.r

I searched the HBase source for something that could point to
native
heap
usage (like ByteBuffer#allocateDirect(...)), but I could not find
anything.
Thread count is about 185 (I have 100 handlers), so nothing strange
there as
well.

Question is, could this be HBase or is this a problem with the
hadoop-lzo?

I have currently downgraded to a version known to work, because we
have
a
demo coming up. But still interested in the answer.



Regards,
Friso




--
Todd Lipcon
Software Engineer, Cloudera




--
Todd Lipcon
Software Engineer, Cloudera




--
Todd Lipcon
Software Engineer, Cloudera




--
Todd Lipcon
Software Engineer, Cloudera




--
--Sean

Re: Memory leak in LZO native code?

Reply via email to