[ https://issues.apache.org/jira/browse/YARN-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15626529#comment-15626529 ]
Jason Lowe commented on YARN-5368: ---------------------------------- bq. Recently I noticed same issue with NodeManger when recovery is enabled.NM RES is keep on growing which leads ResourceLocalization slow. We have not seen that on our clusters. Three minutes is a _really_ long time. Do you have gc logging enabled for the nodemanager JVM? It would be interesting to know if it was trying to run one or more GC cycles during that time. If it wasn't GC cycles then I'm not sure how increased off-heap memory would directly contribute to slower resource localization unless the machine was near or at the point where it started swapping. As for the timeline server memory usage, it looks like the rolling level db instances are starting to pile up, accumulating a lot of off-heap memory. Pinging [~jeagles] since I vaguely remember something like this occurring in the past, and there may be a known fix for that issue. > memory leak at timeline server > ------------------------------ > > Key: YARN-5368 > URL: https://issues.apache.org/jira/browse/YARN-5368 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver > Affects Versions: 2.7.1 > Environment: HDP2.4 > CentOS 6.7 > jdk1.8.0_72 > Reporter: Wataru Yukawa > > memory usage of timeline server machine increases gradually. > https://gyazo.com/952dad96c77ae053bae2e4d8c8ab0572 > please check since April. > According to my investigation, timeline server used about 25GB. > top command result > {code} > 90577 yarn 20 0 28.4g 25g 12m S 0.0 40.1 5162:53 > /usr/java/jdk1.8.0_72/bin/java -Dproc_timelineserver -Xmx1024m > -Dhdp.version=2.4.0.0-169 -Dhadoop.log.dir=/var/log/hadoop-yarn/yarn > -Dyarn.log.dir=/var/log/hadoop-yarn/yarn ... > {code} > ps command result > {code} > $ ps ww 90577 > 90577 ? Sl 5162:53 /usr/java/jdk1.8.0_72/bin/java > -Dproc_timelineserver -Xmx1024m -Dhdp.version=2.4.0.0-169 > -Dhadoop.log.dir=/var/log/hadoop-yarn/yarn > -Dyarn.log.dir=/var/log/hadoop-yarn/yarn > -Dhadoop.log.file=yarn-yarn-timelineserver-myhost.log > -Dyarn.log.file=yarn-yarn-timelineserver-myhost.log -Dyarn.home.dir= > -Dyarn.id.str=yarn -Dhadoop.root.logger=INFO,EWMA,RFA > -Dyarn.root.logger=INFO,EWMA,RFA > -Djava.library.path=:/usr/hdp/2.4.0.0-169/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.4.0.0-169/hadoop/lib/native:/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir:/usr/hdp/2.4.0.0-169/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.4.0.0-169/hadoop/lib/native:/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir > -Dyarn.policy.file=hadoop-policy.xml > -Djava.io.tmpdir=/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir > -Dhadoop.log.dir=/var/log/hadoop-yarn/yarn > -Dyarn.log.dir=/var/log/hadoop-yarn/yarn > -Dhadoop.log.file=yarn-yarn-timelineserver-myhost.log > -Dyarn.log.file=yarn-yarn-timelineserver-myhost.log > -Dyarn.home.dir=/usr/hdp/current/hadoop-yarn-timelineserver > -Dhadoop.home.dir=/usr/hdp/2.4.0.0-169/hadoop > -Dhadoop.root.logger=INFO,EWMA,RFA -Dyarn.root.logger=INFO,EWMA,RFA > -Djava.library.path=:/usr/hdp/2.4.0.0-169/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.4.0.0-169/hadoop/lib/native:/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir:/usr/hdp/2.4.0.0-169/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.4.0.0-169/hadoop/lib/native:/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir > -classpath > /usr/hdp/2.4.0.0-169/hadoop/conf:/usr/hdp/2.4.0.0-169/hadoop/conf:/usr/hdp/2.4.0.0-169/hadoop/conf:/usr/hdp/2.4.0.0-169/hadoop/lib/*:/usr/hdp/2.4.0.0-169/hadoop/.//*:/usr/hdp/2.4.0.0-169/hadoop-hdfs/./:/usr/hdp/2.4.0.0-169/hadoop-hdfs/lib/*:/usr/hdp/2.4.0.0-169/hadoop-hdfs/.//*:/usr/hdp/2.4.0.0-169/hadoop-yarn/lib/*:/usr/hdp/2.4.0.0-169/hadoop-yarn/.//*:/usr/hdp/2.4.0.0-169/hadoop-mapreduce/lib/*:/usr/hdp/2.4.0.0-169/hadoop-mapreduce/.//*::/usr/hdp/2.4.0.0-169/tez/*:/usr/hdp/2.4.0.0-169/tez/lib/*:/usr/hdp/2.4.0.0-169/tez/conf:/usr/hdp/2.4.0.0-169/tez/*:/usr/hdp/2.4.0.0-169/tez/lib/*:/usr/hdp/2.4.0.0-169/tez/conf:/usr/hdp/current/hadoop-yarn-timelineserver/.//*:/usr/hdp/current/hadoop-yarn-timelineserver/lib/*:/usr/hdp/2.4.0.0-169/hadoop/conf/timelineserver-config/log4j.properties > > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer > {code} > > Alghough I set -Xmx1024m, actual memory usage is 25GB. > After I restart timeline server, memory usage of timeline server machine > decreases. > https://gyazo.com/130600c17a7d41df8606727a859ae7e3 > Now timelineserver uses less than 1GB memory. > top command result > {code} > 6163 yarn 20 0 3959m 783m 46m S 0.3 1.2 3:37.60 > /usr/java/jdk1.8.0_72/bin/java -Dproc_timelineserver -Xmx1024m > -Dhdp.version=2.4.0.0-169 ... > {code} > I suspect memory leak at timeline server. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org