Today one of our HMaster processes reached near complete heap consumption. I grabbed a heap with jmap, and analysis indicated 48 MB of live objects, and 839 MB of unreachable. (Xmx is set to 1G.) I forced a Full GC, and old gen heap usage immediately dropped to 48 MB.
This HMaster process has been running continuously since February 21st. I started looking through my GC logs, and noticed the trend that the gaps between CMS runs got much longer as time passed. I have log entries for 21 CMS runs on Feb 21st, 1 CMS run on the 22nd, 1 on the 23rd, and as of today (Mar 15), the most recent CMS run was March 10, 5 days ago. At that time, GC successfully brought heap usage down to ~37MB, but there's no indication that I can see of why the intervals between GC runs are so long. Here are my JVM parameters (generated by the CDH scripts): hbase 16150 0.7 2.4 1820300 1194952 ? Sl Feb21 252:09 /usr/java/jdk1.6.0_22/bin/java -Xmx1024m -ea -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:CMSInitiatingOccupancyFraction=60 -verbose:gc -XX: +PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/usr/lib/hbase/bin/../logs/gc-hbase.log -ea -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:CMSInitiatingOccupancyFraction=60 -verbose:gc -XX:+PrintGCDetails -XX :+PrintGCDateStamps -Xloggc:/usr/lib/hbase/bin/../logs/gc-hbase.log -ea -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:CMSInitiatingOccupancyFraction=60 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/usr/lib/hbase/bin/../logs/gc-hbase.log -ea -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:CMSInitiatingOccupancyFraction=60 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/usr/lib/h base/bin/../logs/gc-hbase.log -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.port=10101 -Dcom.sun.management.jmxremote.ssl=false -Dcom.su n.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.port=10101 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.port=1 0101 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.port=10101 -Dhbase.log.dir=/var/log/hbase -Dhbase.log.file=hbase-hbase-master-alf-nam e1002.ve.box.net.log -Dhbase.home.dir=/usr/lib/hbase/bin/.. -Dhbase.id.str=hbase -Dhbase.root.logger=INFO,DRFA -Djava.library.path=/usr/lib/hbase/bin/../lib/native/Linux-amd64-64:/usr/lib/hadoop-0.20/lib/native/L inux-amd64-64 -classpath /usr/lib/hbase/bin/../conf:/usr/java/jdk1.6.0_22/lib/tools.jar:/usr/lib/hbase/bin/..:/usr/lib/hbase/bin/../hbase-0.90.3-cdh3u1.jar:/usr/lib/hbase/bin/../hbase-0.90.3-cdh3u1-tests.jar:/usr /lib/hbase/bin/../lib/activation-1.1.jar:/usr/lib/hbase/bin/../lib/asm-3.1.jar:/usr/lib/hbase/bin/../lib/avro-1.3.3.jar:/usr/lib/hbase/bin/../lib/commons-cli-1.2.jar:/usr/lib/hbase/bin/../lib/commons-codec-1.4.ja r:/usr/lib/hbase/bin/../lib/commons-el-1.0.jar:/usr/lib/hbase/bin/../lib/commons-httpclient-3.1.jar:/usr/lib/hbase/bin/../lib/commons-lang-2.5.jar:/usr/lib/hbase/bin/../lib/commons-logging-1.1.1.jar:/usr/lib/hbas e/bin/../lib/commons-net-1.4.1.jar:/usr/lib/hbase/bin/../lib/core-3.1.1.jar:/usr/lib/hbase/bin/../lib/guava-r06.jar:/usr/lib/hbase/bin/../lib/hadoop-core.jar:/usr/lib/hbase/bin/../lib/jackson-core-asl-1.5.2.jar:/ usr/lib/hbase/bin/../lib/jackson-jaxrs-1.5.5.jar:/usr/lib/hbase/bin/../lib/jackson-mapper-asl-1.5.2.jar:/usr/lib/hbase/bin/../lib/jackson-xc-1.5.5.jar:/usr/lib/hbase/bin/../lib/jasper-compiler-5.5.23.jar:/usr/lib /hbase/bin/../lib/jasper-runtime-5.5.23.jar:/usr/lib/hbase/bin/../lib/jaxb-api-2.1.jar:/usr/lib/hbase/bin/../lib/jaxb-impl-2.1.12.jar:/usr/lib/hbase/bin/../lib/jersey-core-1.4.jar:/usr/lib/hbase/bin/../lib/jersey -json-1.4.jar:/usr/lib/hbase/bin/../lib/jersey-server-1.4.jar:/usr/lib/hbase/bin/../lib/jettison-1.1.jar:/usr/lib/hbase/bin/../lib/jetty-6.1.26.jar:/usr/lib/hbase/bin/../lib/jetty-util-6.1.26.jar:/usr/lib/hbase/b in/../lib/jruby-complete-1.6.0.jar:/usr/lib/hbase/bin/../lib/jsp-2.1-6.1.14.jar:/usr/lib/hbase/bin/../lib/jsp-api-2.1-6.1.14.jar:/usr/lib/hbase/bin/../lib/jsp-api-2.1.jar:/usr/lib/hbase/bin/../lib/jsr311-api-1.1.1.jar:/usr/lib/hbase/bin/../lib/log4j-1.2.16.jar:/usr/lib/hbase/bin/../lib/protobuf-java-2.3.0.jar:/usr/lib/hbase/bin/../lib/servlet-api-2.5-6.1.14.jar:/usr/lib/hbase/bin/../lib/servlet-api-2.5.jar:/usr/lib/hbase /bin/../lib/slf4j-api-1.5.8.jar:/usr/lib/hbase/bin/../lib/slf4j-log4j12-1.5.8.jar:/usr/lib/hbase/bin/../lib/stax-api-1.0.1.jar:/usr/lib/hbase/bin/../lib/thrift-0.2.0.jar:/usr/lib/hbase/bin/../lib/xmlenc-0.52.jar: /usr/lib/hbase/bin/../lib/zookeeper.jar org.apache.hadoop.hbase.master.HMaster start My GC logs are here: https://www.box.com/s/80b8daf63320c7baad83 Any ideas on what could be preventing CMS from keeping the heap clear, or why GC performance appears to have degraded so much over time?
