Can you try running this with CMS GC instead of G1GC? G1 still has some bugs... 64M sounds like it might be G1 "regions"?
-Todd On Thu, Nov 11, 2010 at 2:07 AM, Friso van Vollenhoven < [email protected]> wrote: > Hi All, > > (This is all about CDH3, so I am not sure whether it should go on this > list, but I figure it is at least interesting for people trying the same.) > > I've recently tried CDH3 on a new cluster from RPMs with the hadoop-lzo > fork from https://github.com/toddlipcon/hadoop-lzo. Everything works like > a charm initially, but after some time (minutes to max one hour), the RS JVM > process memory grows to more than twice the given heap size and beyond. I > have seen a RS with 16GB heap that grows to 55GB virtual size. At some > point, everything start swapping and GC times go into the minutes and > everything dies or is considered dead by the master. > > I did a pmap -x on the RS process and that shows a lot of allocated blocks > of about 64M by the process. There about 500 of these, which is 32GB in > total. See: http://pastebin.com/8pgzPf7b (bottom of the file, the blocks > of about 1M on top are probably thread stacks). Unfortunately, Linux shows > the native heap as anon blocks, so I can not link it to a specific lib or > something. > > I am running the latest CDH3 and hadoop-lzo 0.4.6 (from said URL, the one > which has the reinit() support). I run Java 6u21 with the G1 garbage > collector, which has been running fine for some weeks now. Full command line > is: > java -Xmx16000m -XX:+HeapDumpOnOutOfMemoryError > -XX:+UnlockExperimentalVMOptions -XX:+UseG1GC -XX:+UseCompressedOops > -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps > -Xloggc:/export/logs/hbase/gc-hbase.log > -Djava.library.path=/home/inr/java-lib/hbase/native/Linux-amd64-64 > -Djava.net.preferIPv4Stack=true -Dhbase.log.dir=/export/logs/hbase > -Dhbase.log.file=hbase-hbase-regionserver-w3r1.inrdb.ripe.net.log > -Dhbase.home.dir=/usr/lib/hbase/bin/.. -Dhbase.id.str=hbase -Dhbase.r > > I searched the HBase source for something that could point to native heap > usage (like ByteBuffer#allocateDirect(...)), but I could not find anything. > Thread count is about 185 (I have 100 handlers), so nothing strange there as > well. > > Question is, could this be HBase or is this a problem with the hadoop-lzo? > > I have currently downgraded to a version known to work, because we have a > demo coming up. But still interested in the answer. > > > > Regards, > Friso > > -- Todd Lipcon Software Engineer, Cloudera
