[jira] [Created] (YARN-11039) LogAggregationFileControllerFactory::getFileControllerForRead should close FS
Rajesh Balamohan created YARN-11039: --- Summary: LogAggregationFileControllerFactory::getFileControllerForRead should close FS Key: YARN-11039 URL: https://issues.apache.org/jira/browse/YARN-11039 Project: Hadoop YARN Issue Type: Improvement Components: log-aggregation Reporter: Rajesh Balamohan getFileControllerForRead::getFileControllerForRead internally opens up a new FS object everytime and is not closed. When cloud connectors (e.g s3a) is used along with Knox, it ends up leaking KnoxTokenMonitor for every unclosed FS object causing thread leaks in NM. Lines of interest: [https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/filecontroller/LogAggregationFileControllerFactory.java#L167] {noformat} try { Path remoteAppLogDir = fileController.getOlderRemoteAppLogDir(appId, appOwner); if (LogAggregationUtils.getNodeFiles(conf, remoteAppLogDir, appId, appOwner).hasNext()) { return fileController; } } catch (Exception ex) { diagnosticsMsg.append(ex.getMessage() + "\n"); continue; } {noformat} [https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogAggregationUtils.java#L252] {noformat} public static RemoteIterator getNodeFiles(Configuration conf, Path remoteAppLogDir, ApplicationId appId, String appOwner) throws IOException { Path qualifiedLogDir = FileContext.getFileContext(conf).makeQualified(remoteAppLogDir); return FileContext.getFileContext( qualifiedLogDir.toUri(), conf).listStatus(remoteAppLogDir); } {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5764) NUMA awareness support for launching containers
[ https://issues.apache.org/jira/browse/YARN-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15962766#comment-15962766 ] Rajesh Balamohan commented on YARN-5764: [~devaraj.k] Thank you for sharing the patch and the results. Recent JVM versions have {{-XX:useNUMA}} (java -XX:+PrintFlagsFinal | grep useNUMA). Enabling this would instruct JVM to be NUMA aware and GC can take advantage of this fact. Was this flag ({{-XX:useNUMA}}) enabled in the tasks when running the benchmark? Hive on MR is outdated, network intensive and slow. It would be great, if BB benchmark can be run with Hive on Tez which optimizes queries to a great extent. It has much better resource utilization and also elimiates a lot of IO barriers and would be a lot efficient than MR codebase. > NUMA awareness support for launching containers > --- > > Key: YARN-5764 > URL: https://issues.apache.org/jira/browse/YARN-5764 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, yarn >Reporter: Olasoji >Assignee: Devaraj K > Attachments: NUMA Awareness for YARN Containers.pdf, NUMA Performance > Results.pdf, YARN-5764-v0.patch, YARN-5764-v1.patch, YARN-5764-v2.patch > > > The purpose of this feature is to improve Hadoop performance by minimizing > costly remote memory accesses on non SMP systems. Yarn containers, on launch, > will be pinned to a specific NUMA node and all subsequent memory allocations > will be served by the same node, reducing remote memory accesses. The current > default behavior is to spread memory across all NUMA nodes. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6017) node manager physical memory leak
[ https://issues.apache.org/jira/browse/YARN-6017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15772254#comment-15772254 ] Rajesh Balamohan commented on YARN-6017: No. from JVM's accounting perspective it is still at 2048 as per the logs. But need to check if it is anything to do with JVM's internal code itself or netty. Have you tried with other JDK versions? > node manager physical memory leak > - > > Key: YARN-6017 > URL: https://issues.apache.org/jira/browse/YARN-6017 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.7.1 > Environment: OS: > Linux guomai124041 2.6.32-431.el6.x86_64 #1 SMP Fri Nov 22 03:15:09 UTC 2013 > x86_64 x86_64 x86_64 GNU/Linux > jvm: > java version "1.7.0_65" > Java(TM) SE Runtime Environment (build 1.7.0_65-b17) > Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode) >Reporter: chenrongwei > Attachments: 31169_smaps.txt, 31169_smaps.txt > > > In our produce environment, node manager's jvm memory has been set to > '-Xmx2048m',but we notice that after a long time running the process' actual > physical memory size had been reached to 12g (we got this value by top > command as follow). > PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND > 31169 data 20 0 13.2g 12g 6092 S 16.9 13.0 49183:13 java > 31169: /usr/local/jdk/bin/java -Dproc_nodemanager -Xmx2048m > -Dhadoop.log.dir=/home/data/programs/apache-hadoop-2.7.1/logs > -Dyarn.log.dir=/home/data/programs/apache-hadoop-2.7.1/logs > -Dhadoop.log.file=yarn-data-nodemanager.log > -Dyarn.log.file=yarn-data-nodemanager.log -Dyarn.home.dir= -Dyarn.id.str=data > -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA > -Djava.library.path=/home/data/programs/apache-hadoop-2.7.1/lib/native > -Dyarn.policy.file=hadoop-policy.xml -XX:PermSize=128M -XX:MaxPermSize=256M > -XX:+UseC > Address Kbytes Mode Offset DeviceMapping > 0040 4 r-x-- 008:1 java > 0060 4 rw--- 008:1 java > 00601000 10094936 rw--- 000:0 [ anon ] > 00077000 2228224 rw--- 000:0 [ anon ] > 0007f800 131072 rw--- 000:0 [ anon ] > 00325ee0 128 r-x-- 008:1 ld-2.12.so > 00325f01f000 4 r 0001f000 008:1 ld-2.12.so > 00325f02 4 rw--- 0002 008:1 ld-2.12.so > 00325f021000 4 rw--- 000:0 [ anon ] > 00325f201576 r-x-- 008:1 libc-2.12.so > 00325f38a0002048 - 0018a000 008:1 libc-2.12.so > 00325f58a000 16 r 0018a000 008:1 libc-2.12.so > 00325f58e000 4 rw--- 0018e000 008:1 libc-2.12.so > 00325f58f000 20 rw--- 000:0 [ anon ] > 00325f60 92 r-x-- 008:1 libpthread-2.12.so > 00325f6170002048 - 00017000 008:1 libpthread-2.12.so > 00325f817000 4 r 00017000 008:1 libpthread-2.12.so > 00325f818000 4 rw--- 00018000 008:1 libpthread-2.12.so > 00325f819000 16 rw--- 000:0 [ anon ] > 00325fa0 8 r-x-- 008:1 libdl-2.12.so > 00325fa020002048 - 2000 008:1 libdl-2.12.so > 00325fc02000 4 r 2000 008:1 libdl-2.12.so > 00325fc03000 4 rw--- 3000 008:1 libdl-2.12.so > 00325fe0 28 r-x-- 008:1 librt-2.12.so > 00325fe070002044 - 7000 008:1 librt-2.12.so > 003260006000 4 r 6000 008:1 librt-2.12.so > 003260007000 4 rw--- 7000 008:1 librt-2.12.so > 00326020 524 r-x-- 008:1 libm-2.12.so > 0032602830002044 - 00083000 008:1 libm-2.12.so > 003260482000 4 r 00082000 008:1 libm-2.12.so > 003260483000 4 rw--- 00083000 008:1 libm-2.12.so > 00326120 88 r-x-- 008:1 libresolv-2.12.so > 0032612160002048 - 00016000 008:1 libresolv-2.12.so > 003261416000 4 r 00016000 008:1 libresolv-2.12.so > 003261417000 4 rw--- 00017000 008:1 libresolv-2.12.so -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6017) node manager physical memory leak
[ https://issues.apache.org/jira/browse/YARN-6017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15772212#comment-15772212 ] Rajesh Balamohan commented on YARN-6017: "10193716 in [heap]" in smaps indicates that it is from native side. > node manager physical memory leak > - > > Key: YARN-6017 > URL: https://issues.apache.org/jira/browse/YARN-6017 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.7.1 > Environment: OS: > Linux guomai124041 2.6.32-431.el6.x86_64 #1 SMP Fri Nov 22 03:15:09 UTC 2013 > x86_64 x86_64 x86_64 GNU/Linux > jvm: > java version "1.7.0_65" > Java(TM) SE Runtime Environment (build 1.7.0_65-b17) > Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode) >Reporter: chenrongwei > Attachments: 31169_smaps.txt, 31169_smaps.txt > > > In our produce environment, node manager's jvm memory has been set to > '-Xmx2048m',but we notice that after a long time running the process' actual > physical memory size had been reached to 12g (we got this value by top > command as follow). > PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND > 31169 data 20 0 13.2g 12g 6092 S 16.9 13.0 49183:13 java > 31169: /usr/local/jdk/bin/java -Dproc_nodemanager -Xmx2048m > -Dhadoop.log.dir=/home/data/programs/apache-hadoop-2.7.1/logs > -Dyarn.log.dir=/home/data/programs/apache-hadoop-2.7.1/logs > -Dhadoop.log.file=yarn-data-nodemanager.log > -Dyarn.log.file=yarn-data-nodemanager.log -Dyarn.home.dir= -Dyarn.id.str=data > -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA > -Djava.library.path=/home/data/programs/apache-hadoop-2.7.1/lib/native > -Dyarn.policy.file=hadoop-policy.xml -XX:PermSize=128M -XX:MaxPermSize=256M > -XX:+UseC > Address Kbytes Mode Offset DeviceMapping > 0040 4 r-x-- 008:1 java > 0060 4 rw--- 008:1 java > 00601000 10094936 rw--- 000:0 [ anon ] > 00077000 2228224 rw--- 000:0 [ anon ] > 0007f800 131072 rw--- 000:0 [ anon ] > 00325ee0 128 r-x-- 008:1 ld-2.12.so > 00325f01f000 4 r 0001f000 008:1 ld-2.12.so > 00325f02 4 rw--- 0002 008:1 ld-2.12.so > 00325f021000 4 rw--- 000:0 [ anon ] > 00325f201576 r-x-- 008:1 libc-2.12.so > 00325f38a0002048 - 0018a000 008:1 libc-2.12.so > 00325f58a000 16 r 0018a000 008:1 libc-2.12.so > 00325f58e000 4 rw--- 0018e000 008:1 libc-2.12.so > 00325f58f000 20 rw--- 000:0 [ anon ] > 00325f60 92 r-x-- 008:1 libpthread-2.12.so > 00325f6170002048 - 00017000 008:1 libpthread-2.12.so > 00325f817000 4 r 00017000 008:1 libpthread-2.12.so > 00325f818000 4 rw--- 00018000 008:1 libpthread-2.12.so > 00325f819000 16 rw--- 000:0 [ anon ] > 00325fa0 8 r-x-- 008:1 libdl-2.12.so > 00325fa020002048 - 2000 008:1 libdl-2.12.so > 00325fc02000 4 r 2000 008:1 libdl-2.12.so > 00325fc03000 4 rw--- 3000 008:1 libdl-2.12.so > 00325fe0 28 r-x-- 008:1 librt-2.12.so > 00325fe070002044 - 7000 008:1 librt-2.12.so > 003260006000 4 r 6000 008:1 librt-2.12.so > 003260007000 4 rw--- 7000 008:1 librt-2.12.so > 00326020 524 r-x-- 008:1 libm-2.12.so > 0032602830002044 - 00083000 008:1 libm-2.12.so > 003260482000 4 r 00082000 008:1 libm-2.12.so > 003260483000 4 rw--- 00083000 008:1 libm-2.12.so > 00326120 88 r-x-- 008:1 libresolv-2.12.so > 0032612160002048 - 00016000 008:1 libresolv-2.12.so > 003261416000 4 r 00016000 008:1 libresolv-2.12.so > 003261417000 4 rw--- 00017000 008:1 libresolv-2.12.so -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6017) node manager physical memory leak
[ https://issues.apache.org/jira/browse/YARN-6017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15772116#comment-15772116 ] Rajesh Balamohan commented on YARN-6017: Usage seems to be from native side. Can you also post "/proc/31169/smaps" ? > node manager physical memory leak > - > > Key: YARN-6017 > URL: https://issues.apache.org/jira/browse/YARN-6017 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.7.1 > Environment: OS: > Linux guomai124041 2.6.32-431.el6.x86_64 #1 SMP Fri Nov 22 03:15:09 UTC 2013 > x86_64 x86_64 x86_64 GNU/Linux > jvm: > java version "1.7.0_65" > Java(TM) SE Runtime Environment (build 1.7.0_65-b17) > Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode) >Reporter: chenrongwei > > In our produce environment, node manager's jvm memory has been set to > '-Xmx2048m',but we notice that after a long time running the process' actual > physical memory size had been reached to 12g (we got this value by top > command as follow). > PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND > 31169 data 20 0 13.2g 12g 6092 S 16.9 13.0 49183:13 java > 31169: /usr/local/jdk/bin/java -Dproc_nodemanager -Xmx2048m > -Dhadoop.log.dir=/home/data/programs/apache-hadoop-2.7.1/logs > -Dyarn.log.dir=/home/data/programs/apache-hadoop-2.7.1/logs > -Dhadoop.log.file=yarn-data-nodemanager.log > -Dyarn.log.file=yarn-data-nodemanager.log -Dyarn.home.dir= -Dyarn.id.str=data > -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA > -Djava.library.path=/home/data/programs/apache-hadoop-2.7.1/lib/native > -Dyarn.policy.file=hadoop-policy.xml -XX:PermSize=128M -XX:MaxPermSize=256M > -XX:+UseC > Address Kbytes Mode Offset DeviceMapping > 0040 4 r-x-- 008:1 java > 0060 4 rw--- 008:1 java > 00601000 10094936 rw--- 000:0 [ anon ] > 00077000 2228224 rw--- 000:0 [ anon ] > 0007f800 131072 rw--- 000:0 [ anon ] > 00325ee0 128 r-x-- 008:1 ld-2.12.so > 00325f01f000 4 r 0001f000 008:1 ld-2.12.so > 00325f02 4 rw--- 0002 008:1 ld-2.12.so > 00325f021000 4 rw--- 000:0 [ anon ] > 00325f201576 r-x-- 008:1 libc-2.12.so > 00325f38a0002048 - 0018a000 008:1 libc-2.12.so > 00325f58a000 16 r 0018a000 008:1 libc-2.12.so > 00325f58e000 4 rw--- 0018e000 008:1 libc-2.12.so > 00325f58f000 20 rw--- 000:0 [ anon ] > 00325f60 92 r-x-- 008:1 libpthread-2.12.so > 00325f6170002048 - 00017000 008:1 libpthread-2.12.so > 00325f817000 4 r 00017000 008:1 libpthread-2.12.so > 00325f818000 4 rw--- 00018000 008:1 libpthread-2.12.so > 00325f819000 16 rw--- 000:0 [ anon ] > 00325fa0 8 r-x-- 008:1 libdl-2.12.so > 00325fa020002048 - 2000 008:1 libdl-2.12.so > 00325fc02000 4 r 2000 008:1 libdl-2.12.so > 00325fc03000 4 rw--- 3000 008:1 libdl-2.12.so > 00325fe0 28 r-x-- 008:1 librt-2.12.so > 00325fe070002044 - 7000 008:1 librt-2.12.so > 003260006000 4 r 6000 008:1 librt-2.12.so > 003260007000 4 rw--- 7000 008:1 librt-2.12.so > 00326020 524 r-x-- 008:1 libm-2.12.so > 0032602830002044 - 00083000 008:1 libm-2.12.so > 003260482000 4 r 00082000 008:1 libm-2.12.so > 003260483000 4 rw--- 00083000 008:1 libm-2.12.so > 00326120 88 r-x-- 008:1 libresolv-2.12.so > 0032612160002048 - 00016000 008:1 libresolv-2.12.so > 003261416000 4 r 00016000 008:1 libresolv-2.12.so > 003261417000 4 rw--- 00017000 008:1 libresolv-2.12.so -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6017) node manager physical memory leak
[ https://issues.apache.org/jira/browse/YARN-6017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15771984#comment-15771984 ] Rajesh Balamohan commented on YARN-6017: Can you share the details of {noformat} jmap -heap {noformat} Can you also get the heapdump if the proc is still alive? {noformat} jmap -dump:format=b,file=/tmp/nm.hprof {noformat} > node manager physical memory leak > - > > Key: YARN-6017 > URL: https://issues.apache.org/jira/browse/YARN-6017 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.7.1 > Environment: OS: > Linux guomai124041 2.6.32-431.el6.x86_64 #1 SMP Fri Nov 22 03:15:09 UTC 2013 > x86_64 x86_64 x86_64 GNU/Linux > jvm: > java version "1.7.0_65" > Java(TM) SE Runtime Environment (build 1.7.0_65-b17) > Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode) >Reporter: chenrongwei > > In our produce environment, node manager's jvm memory has been set to > '-Xmx2048m',but we notice that after a long time running the process' actual > physical memory size had been reached to 12g (we got this value by top > command as follow). > PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND > 31169 data 20 0 13.2g 12g 6092 S 16.9 13.0 49183:13 java > 31169: /usr/local/jdk/bin/java -Dproc_nodemanager -Xmx2048m > -Dhadoop.log.dir=/home/data/programs/apache-hadoop-2.7.1/logs > -Dyarn.log.dir=/home/data/programs/apache-hadoop-2.7.1/logs > -Dhadoop.log.file=yarn-data-nodemanager.log > -Dyarn.log.file=yarn-data-nodemanager.log -Dyarn.home.dir= -Dyarn.id.str=data > -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA > -Djava.library.path=/home/data/programs/apache-hadoop-2.7.1/lib/native > -Dyarn.policy.file=hadoop-policy.xml -XX:PermSize=128M -XX:MaxPermSize=256M > -XX:+UseC > Address Kbytes Mode Offset DeviceMapping > 0040 4 r-x-- 008:1 java > 0060 4 rw--- 008:1 java > 00601000 10094936 rw--- 000:0 [ anon ] > 00077000 2228224 rw--- 000:0 [ anon ] > 0007f800 131072 rw--- 000:0 [ anon ] > 00325ee0 128 r-x-- 008:1 ld-2.12.so > 00325f01f000 4 r 0001f000 008:1 ld-2.12.so > 00325f02 4 rw--- 0002 008:1 ld-2.12.so > 00325f021000 4 rw--- 000:0 [ anon ] > 00325f201576 r-x-- 008:1 libc-2.12.so > 00325f38a0002048 - 0018a000 008:1 libc-2.12.so > 00325f58a000 16 r 0018a000 008:1 libc-2.12.so > 00325f58e000 4 rw--- 0018e000 008:1 libc-2.12.so > 00325f58f000 20 rw--- 000:0 [ anon ] > 00325f60 92 r-x-- 008:1 libpthread-2.12.so > 00325f6170002048 - 00017000 008:1 libpthread-2.12.so > 00325f817000 4 r 00017000 008:1 libpthread-2.12.so > 00325f818000 4 rw--- 00018000 008:1 libpthread-2.12.so > 00325f819000 16 rw--- 000:0 [ anon ] > 00325fa0 8 r-x-- 008:1 libdl-2.12.so > 00325fa020002048 - 2000 008:1 libdl-2.12.so > 00325fc02000 4 r 2000 008:1 libdl-2.12.so > 00325fc03000 4 rw--- 3000 008:1 libdl-2.12.so > 00325fe0 28 r-x-- 008:1 librt-2.12.so > 00325fe070002044 - 7000 008:1 librt-2.12.so > 003260006000 4 r 6000 008:1 librt-2.12.so > 003260007000 4 rw--- 7000 008:1 librt-2.12.so > 00326020 524 r-x-- 008:1 libm-2.12.so > 0032602830002044 - 00083000 008:1 libm-2.12.so > 003260482000 4 r 00082000 008:1 libm-2.12.so > 003260483000 4 rw--- 00083000 008:1 libm-2.12.so > 00326120 88 r-x-- 008:1 libresolv-2.12.so > 0032612160002048 - 00016000 008:1 libresolv-2.12.so > 003261416000 4 r 00016000 008:1 libresolv-2.12.so > 003261417000 4 rw--- 00017000 008:1 libresolv-2.12.so -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5551) Ignore file backed pages from memory computation when smaps is enabled
[ https://issues.apache.org/jira/browse/YARN-5551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated YARN-5551: --- Attachment: YARN-5551.branch-2.003.patch Rebasing to address the checkstyle issues. > Ignore file backed pages from memory computation when smaps is enabled > -- > > Key: YARN-5551 > URL: https://issues.apache.org/jira/browse/YARN-5551 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Attachments: YARN-5551.branch-2.001.patch, > YARN-5551.branch-2.002.patch, YARN-5551.branch-2.003.patch > > > Currently deleted file mappings are also included in the memory computation > when SMAP is enabled. For e.g > {noformat} > 7f612004a000-7f612004c000 rw-s 00:10 4201507513 > /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-521969216_162_734673185 > (deleted) > Size: 8 kB > Rss: 4 kB > Pss: 2 kB > Shared_Clean: 0 kB > Shared_Dirty: 4 kB > Private_Clean: 0 kB > Private_Dirty: 0 kB > Referenced:4 kB > Anonymous: 0 kB > AnonHugePages: 0 kB > Swap: 0 kB > KernelPageSize:4 kB > MMUPageSize: 4 kB > 7fbf2800-7fbf6800 rw-s 08:02 11927571 > /tmp/7298569189125604642/arena-1291157252088664681.cache (deleted) > Size:1048576 kB > Rss: 17288 kB > Pss: 17288 kB > Shared_Clean: 0 kB > Shared_Dirty: 0 kB > Private_Clean: 232 kB > Private_Dirty: 17056 kB > Referenced:17288 kB > Anonymous: 0 kB > AnonHugePages: 0 kB > Swap: 0 kB > KernelPageSize:4 kB > MMUPageSize: 4 kB > {noformat} > It would be good to exclude these from getSmapBasedRssMemorySize() > computation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5551) Ignore deleted file mapping from memory computation when smaps is enabled
[ https://issues.apache.org/jira/browse/YARN-5551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15515481#comment-15515481 ] Rajesh Balamohan edited comment on YARN-5551 at 9/23/16 5:43 AM: - Attaching .2 version which takes into account "anonymous" pages. was (Author: rajesh.balamohan): Attaching .2 version which takes into account "anonymous". > Ignore deleted file mapping from memory computation when smaps is enabled > - > > Key: YARN-5551 > URL: https://issues.apache.org/jira/browse/YARN-5551 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Attachments: YARN-5551.branch-2.001.patch, > YARN-5551.branch-2.002.patch > > > Currently deleted file mappings are also included in the memory computation > when SMAP is enabled. For e.g > {noformat} > 7f612004a000-7f612004c000 rw-s 00:10 4201507513 > /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-521969216_162_734673185 > (deleted) > Size: 8 kB > Rss: 4 kB > Pss: 2 kB > Shared_Clean: 0 kB > Shared_Dirty: 4 kB > Private_Clean: 0 kB > Private_Dirty: 0 kB > Referenced:4 kB > Anonymous: 0 kB > AnonHugePages: 0 kB > Swap: 0 kB > KernelPageSize:4 kB > MMUPageSize: 4 kB > 7fbf2800-7fbf6800 rw-s 08:02 11927571 > /tmp/7298569189125604642/arena-1291157252088664681.cache (deleted) > Size:1048576 kB > Rss: 17288 kB > Pss: 17288 kB > Shared_Clean: 0 kB > Shared_Dirty: 0 kB > Private_Clean: 232 kB > Private_Dirty: 17056 kB > Referenced:17288 kB > Anonymous: 0 kB > AnonHugePages: 0 kB > Swap: 0 kB > KernelPageSize:4 kB > MMUPageSize: 4 kB > {noformat} > It would be good to exclude these from getSmapBasedRssMemorySize() > computation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5551) Ignore deleted file mapping from memory computation when smaps is enabled
[ https://issues.apache.org/jira/browse/YARN-5551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated YARN-5551: --- Attachment: YARN-5551.branch-2.002.patch Attaching .2 version which takes into account "anonymous". > Ignore deleted file mapping from memory computation when smaps is enabled > - > > Key: YARN-5551 > URL: https://issues.apache.org/jira/browse/YARN-5551 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Attachments: YARN-5551.branch-2.001.patch, > YARN-5551.branch-2.002.patch > > > Currently deleted file mappings are also included in the memory computation > when SMAP is enabled. For e.g > {noformat} > 7f612004a000-7f612004c000 rw-s 00:10 4201507513 > /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-521969216_162_734673185 > (deleted) > Size: 8 kB > Rss: 4 kB > Pss: 2 kB > Shared_Clean: 0 kB > Shared_Dirty: 4 kB > Private_Clean: 0 kB > Private_Dirty: 0 kB > Referenced:4 kB > Anonymous: 0 kB > AnonHugePages: 0 kB > Swap: 0 kB > KernelPageSize:4 kB > MMUPageSize: 4 kB > 7fbf2800-7fbf6800 rw-s 08:02 11927571 > /tmp/7298569189125604642/arena-1291157252088664681.cache (deleted) > Size:1048576 kB > Rss: 17288 kB > Pss: 17288 kB > Shared_Clean: 0 kB > Shared_Dirty: 0 kB > Private_Clean: 232 kB > Private_Dirty: 17056 kB > Referenced:17288 kB > Anonymous: 0 kB > AnonHugePages: 0 kB > Swap: 0 kB > KernelPageSize:4 kB > MMUPageSize: 4 kB > {noformat} > It would be good to exclude these from getSmapBasedRssMemorySize() > computation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5551) Ignore deleted file mapping from memory computation when smaps is enabled
[ https://issues.apache.org/jira/browse/YARN-5551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15438057#comment-15438057 ] Rajesh Balamohan edited comment on YARN-5551 at 8/25/16 11:37 PM: -- This patch worked for the scenario we ran into. If memory mapping of a file is anon=0, should that cause the process to be killed? A more generic patch would be figure out whether memory mapping with annon=0 should be deciding factor for killing the process. was (Author: rajesh.balamohan): This patch worked for the scenario we ran into. If memory mapping of a file is anon=0, should that cause the process to be killed. A more generic patch would be figure out whether memory mapping with annon=0 should be deciding factor for killing the process. > Ignore deleted file mapping from memory computation when smaps is enabled > - > > Key: YARN-5551 > URL: https://issues.apache.org/jira/browse/YARN-5551 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Attachments: YARN-5551.branch-2.001.patch > > > Currently deleted file mappings are also included in the memory computation > when SMAP is enabled. For e.g > {noformat} > 7f612004a000-7f612004c000 rw-s 00:10 4201507513 > /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-521969216_162_734673185 > (deleted) > Size: 8 kB > Rss: 4 kB > Pss: 2 kB > Shared_Clean: 0 kB > Shared_Dirty: 4 kB > Private_Clean: 0 kB > Private_Dirty: 0 kB > Referenced:4 kB > Anonymous: 0 kB > AnonHugePages: 0 kB > Swap: 0 kB > KernelPageSize:4 kB > MMUPageSize: 4 kB > 7fbf2800-7fbf6800 rw-s 08:02 11927571 > /tmp/7298569189125604642/arena-1291157252088664681.cache (deleted) > Size:1048576 kB > Rss: 17288 kB > Pss: 17288 kB > Shared_Clean: 0 kB > Shared_Dirty: 0 kB > Private_Clean: 232 kB > Private_Dirty: 17056 kB > Referenced:17288 kB > Anonymous: 0 kB > AnonHugePages: 0 kB > Swap: 0 kB > KernelPageSize:4 kB > MMUPageSize: 4 kB > {noformat} > It would be good to exclude these from getSmapBasedRssMemorySize() > computation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5551) Ignore deleted file mapping from memory computation when smaps is enabled
[ https://issues.apache.org/jira/browse/YARN-5551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15438057#comment-15438057 ] Rajesh Balamohan commented on YARN-5551: This patch worked for the scenario we ran into. If memory mapping of a file is anon=0, should that cause the process to be killed. A more generic patch would be figure out whether memory mapping with annon=0 should be deciding factor for killing the process. > Ignore deleted file mapping from memory computation when smaps is enabled > - > > Key: YARN-5551 > URL: https://issues.apache.org/jira/browse/YARN-5551 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Attachments: YARN-5551.branch-2.001.patch > > > Currently deleted file mappings are also included in the memory computation > when SMAP is enabled. For e.g > {noformat} > 7f612004a000-7f612004c000 rw-s 00:10 4201507513 > /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-521969216_162_734673185 > (deleted) > Size: 8 kB > Rss: 4 kB > Pss: 2 kB > Shared_Clean: 0 kB > Shared_Dirty: 4 kB > Private_Clean: 0 kB > Private_Dirty: 0 kB > Referenced:4 kB > Anonymous: 0 kB > AnonHugePages: 0 kB > Swap: 0 kB > KernelPageSize:4 kB > MMUPageSize: 4 kB > 7fbf2800-7fbf6800 rw-s 08:02 11927571 > /tmp/7298569189125604642/arena-1291157252088664681.cache (deleted) > Size:1048576 kB > Rss: 17288 kB > Pss: 17288 kB > Shared_Clean: 0 kB > Shared_Dirty: 0 kB > Private_Clean: 232 kB > Private_Dirty: 17056 kB > Referenced:17288 kB > Anonymous: 0 kB > AnonHugePages: 0 kB > Swap: 0 kB > KernelPageSize:4 kB > MMUPageSize: 4 kB > {noformat} > It would be good to exclude these from getSmapBasedRssMemorySize() > computation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5551) Ignore deleted file mapping from memory computation when smaps is enabled
[ https://issues.apache.org/jira/browse/YARN-5551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated YARN-5551: --- Attachment: YARN-5551.branch-2.001.patch > Ignore deleted file mapping from memory computation when smaps is enabled > - > > Key: YARN-5551 > URL: https://issues.apache.org/jira/browse/YARN-5551 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Attachments: YARN-5551.branch-2.001.patch > > > Currently deleted file mappings are also included in the memory computation > when SMAP is enabled. For e.g > {noformat} > 7f612004a000-7f612004c000 rw-s 00:10 4201507513 > /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-521969216_162_734673185 > (deleted) > Size: 8 kB > Rss: 4 kB > Pss: 2 kB > Shared_Clean: 0 kB > Shared_Dirty: 4 kB > Private_Clean: 0 kB > Private_Dirty: 0 kB > Referenced:4 kB > Anonymous: 0 kB > AnonHugePages: 0 kB > Swap: 0 kB > KernelPageSize:4 kB > MMUPageSize: 4 kB > 7f6123f99000-7f6163f99000 rw-p 08:41 211419477 > /grid/4/hadoop/yarn/local/usercache/root/appcache/application_1466700718395_1249/container_e19_1466700718395_1249_01_03/7389389356021597290.cache > (deleted) > Size:1048576 kB > Rss: 637292 kB > Pss: 637292 kB > Shared_Clean: 0 kB > Shared_Dirty: 0 kB > Private_Clean: 0 kB > Private_Dirty:637292 kB > Referenced: 637292 kB > Anonymous:637292 kB > AnonHugePages: 0 kB > Swap: 0 kB > KernelPageSize:4 kB > {noformat} > It would be good to exclude these from getSmapBasedRssMemorySize() > computation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5551) Ignore deleted file mapping from memory computation when smaps is enabled
[ https://issues.apache.org/jira/browse/YARN-5551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated YARN-5551: --- Target Version/s: 2.7.3 > Ignore deleted file mapping from memory computation when smaps is enabled > - > > Key: YARN-5551 > URL: https://issues.apache.org/jira/browse/YARN-5551 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > > Currently deleted file mappings are also included in the memory computation > when SMAP is enabled. For e.g > {noformat} > 7f612004a000-7f612004c000 rw-s 00:10 4201507513 > /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-521969216_162_734673185 > (deleted) > Size: 8 kB > Rss: 4 kB > Pss: 2 kB > Shared_Clean: 0 kB > Shared_Dirty: 4 kB > Private_Clean: 0 kB > Private_Dirty: 0 kB > Referenced:4 kB > Anonymous: 0 kB > AnonHugePages: 0 kB > Swap: 0 kB > KernelPageSize:4 kB > MMUPageSize: 4 kB > 7f6123f99000-7f6163f99000 rw-p 08:41 211419477 > /grid/4/hadoop/yarn/local/usercache/root/appcache/application_1466700718395_1249/container_e19_1466700718395_1249_01_03/7389389356021597290.cache > (deleted) > Size:1048576 kB > Rss: 637292 kB > Pss: 637292 kB > Shared_Clean: 0 kB > Shared_Dirty: 0 kB > Private_Clean: 0 kB > Private_Dirty:637292 kB > Referenced: 637292 kB > Anonymous:637292 kB > AnonHugePages: 0 kB > Swap: 0 kB > KernelPageSize:4 kB > {noformat} > It would be good to exclude these from getSmapBasedRssMemorySize() > computation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5551) Ignore deleted file mapping from memory computation when smaps is enabled
Rajesh Balamohan created YARN-5551: -- Summary: Ignore deleted file mapping from memory computation when smaps is enabled Key: YARN-5551 URL: https://issues.apache.org/jira/browse/YARN-5551 Project: Hadoop YARN Issue Type: Improvement Reporter: Rajesh Balamohan Assignee: Rajesh Balamohan Priority: Minor Currently deleted file mappings are also included in the memory computation when SMAP is enabled. For e.g {noformat} 7f612004a000-7f612004c000 rw-s 00:10 4201507513 /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-521969216_162_734673185 (deleted) Size: 8 kB Rss: 4 kB Pss: 2 kB Shared_Clean: 0 kB Shared_Dirty: 4 kB Private_Clean: 0 kB Private_Dirty: 0 kB Referenced:4 kB Anonymous: 0 kB AnonHugePages: 0 kB Swap: 0 kB KernelPageSize:4 kB MMUPageSize: 4 kB 7f6123f99000-7f6163f99000 rw-p 08:41 211419477 /grid/4/hadoop/yarn/local/usercache/root/appcache/application_1466700718395_1249/container_e19_1466700718395_1249_01_03/7389389356021597290.cache (deleted) Size:1048576 kB Rss: 637292 kB Pss: 637292 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty:637292 kB Referenced: 637292 kB Anonymous:637292 kB AnonHugePages: 0 kB Swap: 0 kB KernelPageSize:4 kB {noformat} It would be good to exclude these from getSmapBasedRssMemorySize() computation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5296) NMs going OutOfMemory because ContainerMetrics leak in ContainerMonitorImpl
[ https://issues.apache.org/jira/browse/YARN-5296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15351356#comment-15351356 ] Rajesh Balamohan commented on YARN-5296: Based on offline conversation with [~karams], i have changed assignee to [~djp]. \cc [~djp] > NMs going OutOfMemory because ContainerMetrics leak in ContainerMonitorImpl > --- > > Key: YARN-5296 > URL: https://issues.apache.org/jira/browse/YARN-5296 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.8.0, 2.9.0 >Reporter: Karam Singh >Assignee: Junping Du > > Ran tests in following manner, > 1. Run GridMix of 768 sequestionally around 17 times to execute about 12.9K > apps. > 2. After 4-5hrs take Check NM Heap using Memory Analyser. It report around > 96% Heap is being used my ContainerMetrics > 3. Run 7 more GridMix run for have around 18.2apps ran in total. Again check > NM heap using Memory Analyser again 96% heap is being used by > ContainerMetrics. > 4. Start one more grimdmix run, while run going on , NMs started going down > with OOM, around running 18.7K+, On analysing NM heap using Memory analyser, > OOM was caused by ContainerMetrics -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5296) NMs going OutOfMemory because ContainerMetrics leak in ContainerMonitorImpl
[ https://issues.apache.org/jira/browse/YARN-5296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated YARN-5296: --- Assignee: Junping Du > NMs going OutOfMemory because ContainerMetrics leak in ContainerMonitorImpl > --- > > Key: YARN-5296 > URL: https://issues.apache.org/jira/browse/YARN-5296 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.8.0, 2.9.0 >Reporter: Karam Singh >Assignee: Junping Du > > Ran tests in following manner, > 1. Run GridMix of 768 sequestionally around 17 times to execute about 12.9K > apps. > 2. After 4-5hrs take Check NM Heap using Memory Analyser. It report around > 96% Heap is being used my ContainerMetrics > 3. Run 7 more GridMix run for have around 18.2apps ran in total. Again check > NM heap using Memory Analyser again 96% heap is being used by > ContainerMetrics. > 4. Start one more grimdmix run, while run going on , NMs started going down > with OOM, around running 18.7K+, On analysing NM heap using Memory analyser, > OOM was caused by ContainerMetrics -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS
[ https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700657#comment-14700657 ] Rajesh Balamohan commented on YARN-3942: Should this be resilient to cluster restarts? For e.g, when cluster restart happens, timeline server automatically gets killed with the following exception. {noformat} 2015-08-18 01:03:31,523 [EntityLogPluginWorker #6] ERROR org.apache.hadoop.yarn.server.timeline.EntityFileTimelineStore: Error scanning active files ... ... [EntityLogPluginWorker #0] ERROR org.apache.hadoop.yarn.server.timeline.EntityFileTimelineStore: Error scanning active files java.io.EOFException: End of File Exception between local host is: atsmachine; destination host is: m1:8020; : java.io.EOFException; For more details see: http://wiki.apache.org/hadoop/EOFException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:422) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:765) at org.apache.hadoop.ipc.Client.call(Client.java:1444) at org.apache.hadoop.ipc.Client.call(Client.java:1371) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) at com.sun.proxy.$Proxy26.getListing(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:574) at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104) at com.sun.proxy.$Proxy27.getListing(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1748) at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.init(DistributedFileSystem.java:973) at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.init(DistributedFileSystem.java:984) at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.init(DistributedFileSystem.java:956) at org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:935) at org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:931) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:943) at org.apache.hadoop.yarn.server.timeline.EntityFileTimelineStore.scanActiveLogs(EntityFileTimelineStore.java:314) at org.apache.hadoop.yarn.server.timeline.EntityFileTimelineStore.access$1300(EntityFileTimelineStore.java:79) at org.apache.hadoop.yarn.server.timeline.EntityFileTimelineStore$EntityLogScanner.run(EntityFileTimelineStore.java:771) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:392) at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1098) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:993) 2015-08-18 01:03:35,600 [SIGTERM handler] ERROR org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer: RECEIVED SIGNAL 15: SIGTERM 2015-08-18 01:03:35,608 [Thread-1] INFO org.mortbay.log: Stopped HttpServer2$SelectChannelConnectorWithSafeStartup@atsmachine:8188 2015-08-18 01:03:35,710 [Thread-1] INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping ApplicationHistoryServer metrics system... 2015-08-18 01:03:35,712 [Thread-1] INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl:
[jira] [Created] (YARN-3797) NodeManager not blacklisting the disk (shuffle) with errors
Rajesh Balamohan created YARN-3797: -- Summary: NodeManager not blacklisting the disk (shuffle) with errors Key: YARN-3797 URL: https://issues.apache.org/jira/browse/YARN-3797 Project: Hadoop YARN Issue Type: Bug Reporter: Rajesh Balamohan In a multi-node environment, one of the disk (where map outputs are written) in a node went bad. Errors are given below. {noformat} Info fld=0x9ad090a sd 6:0:5:0: [sdf] Add. Sense: Unrecovered read error sd 6:0:5:0: [sdf] CDB: Read(10): 28 00 09 ad 09 08 00 00 08 00 end_request: critical medium error, dev sdf, sector 162334984 mpt2sas0: log_info(0x3108): originator(PL), code(0x08), sub_code(0x) sd 6:0:5:0: [sdf] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE sd 6:0:5:0: [sdf] Sense Key : Medium Error [current] Info fld=0x9af8892 sd 6:0:5:0: [sdf] Add. Sense: Unrecovered read error sd 6:0:5:0: [sdf] CDB: Read(10): 28 00 09 af 88 90 00 00 08 00 end_request: critical medium error, dev sdf, sector 162498704 mpt2sas0: log_info(0x3108): originator(PL), code(0x08), sub_code(0x) mpt2sas0: log_info(0x3108): originator(PL), code(0x08), sub_code(0x) sd 6:0:5:0: [sdf] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE sd 6:0:5:0: [sdf] Sense Key : Medium Error [current] Info fld=0x9af8892 sd 6:0:5:0: [sdf] Add. Sense: Unrecovered read error sd 6:0:5:0: [sdf] CDB: Read(10): 28 00 09 af 88 90 00 00 08 00 end_request: critical medium error, dev sdf, sector 162498704 {noformat} Diskchecker would pass as the system allows to create directories and delete directories without issues. But data being served out can be corrupt and fetchers fail during CRC verification with unwanted delays and retries. Ideally node manager should detect such errors and blacklist/remove those disks from NM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2314) ContainerManagementProtocolProxy can create thousands of threads for a large cluster
[ https://issues.apache.org/jira/browse/YARN-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated YARN-2314: --- Attachment: tez-yarn-2314.xlsx Attaching the results of getProxy() call for tez with 20 nodes with this patch for different cache sizes and for different data sizes (tested a job @200GB and 10 TB scale). Overall, there is slight degradation in performance (in milliseconds) by setting cache size to 0, but not significant to make an impact in overall job runtime in tez. ContainerManagementProtocolProxy can create thousands of threads for a large cluster Key: YARN-2314 URL: https://issues.apache.org/jira/browse/YARN-2314 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.1.0-beta Reporter: Jason Lowe Assignee: Jason Lowe Priority: Critical Attachments: YARN-2314.patch, YARN-2314v2.patch, disable-cm-proxy-cache.patch, nmproxycachefix.prototype.patch, tez-yarn-2314.xlsx ContainerManagementProtocolProxy has a cache of NM proxies, and the size of this cache is configurable. However the cache can grow far beyond the configured size when running on a large cluster and blow AM address/container limits. More details in the first comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1775) Create SMAPBasedProcessTree to get PSS information
[ https://issues.apache.org/jira/browse/YARN-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated YARN-1775: --- Attachment: YARN-1775-v3.patch getSmapBasedCumulativeRssmem() should be private -Fixed When converting #pages to bytes, use PAGE_SIZE instead of hard-coding 1024. -smap information has KB which needs to be converted to bytes. PAGE_SIZE mostly will be 4096 which will give wrong value in getSmapBasedCumulativeRssmem. Move the constant PROCFS_SMAPS_ENABLED to YarnConfiguration -Fixed. Suggestions for renames PROCFS_SMAPS_ENABLED - PROCFS_USE_SMAPS_BASED_RSS yarn.nodemanager.container-monitor.process-tree.smaps.enabled - yarn.nodemanager.container-monitor.procfs-based-proces-tree.smaps-based-rss.enabled. (Did I just say that? ) -Fixed (yarn.nodemanager.container-monitor.procfs-tree.smaps-based-rss.enabled). Still long I believe. ProcessMemInfo - ProcessTreeSmapMemInfo?, MemoryMappingInfo - ProcessSmapMemoryInfo, moduleMemList - memoryInfoList, processSMAPTree should be cleared in every iteration of updating the process-tree -Fixed isSmapEnabled() should be private -Removed this method completely. As a part of setConf() call, smapEnabled is computed. MemoryMappingInfo.updateModuleMemInfo: We should skip everything else when we run into integer parsing issue of the value. Right now you are logging, ignoring and continuing. -Fixed Rename MEM_INFO to MemInfo to go with other enums in the source? -Fixed We should probably switch the following two ifs? -Fixed Javadoc error -Fixed Reformatted the testcase as well. While enforcing memory constraints, I wonder if people would want to use any other definitions of RSS to be more conservative or aggressive. Do you think it would make sense to provide these options separately, and have what you have as the default? We can punt this to a different JIRA, just wanted to bring it up. -This option can be provided as advanced/expert configuration. We can have a separate JIRA to track it separately. Please feel free to open a new JIRA. Create SMAPBasedProcessTree to get PSS information -- Key: YARN-1775 URL: https://issues.apache.org/jira/browse/YARN-1775 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Rajesh Balamohan Assignee: Rajesh Balamohan Attachments: YARN-1775-v2.patch, YARN-1775-v3.patch, YARN-1775-v3.patch, YARN-1775-v4.patch, yarn-1775-2.4.0.patch Create SMAPBasedProcessTree (by extending ProcfsBasedProcessTree), which will make use of PSS for computing the memory usage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1775) Create SMAPBasedProcessTree to get PSS information
[ https://issues.apache.org/jira/browse/YARN-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated YARN-1775: --- Attachment: YARN-1775-v4.patch Renaming the patch as v4. Create SMAPBasedProcessTree to get PSS information -- Key: YARN-1775 URL: https://issues.apache.org/jira/browse/YARN-1775 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Rajesh Balamohan Assignee: Rajesh Balamohan Attachments: YARN-1775-v2.patch, YARN-1775-v3.patch, YARN-1775-v3.patch, YARN-1775-v4.patch, yarn-1775-2.4.0.patch Create SMAPBasedProcessTree (by extending ProcfsBasedProcessTree), which will make use of PSS for computing the memory usage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1775) Create SMAPBasedProcessTree to get PSS information
[ https://issues.apache.org/jira/browse/YARN-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941725#comment-13941725 ] Rajesh Balamohan commented on YARN-1775: [~vinodkv] - Sure, it would be great if this can be accommodated in ProcfsBasedProcessTree itself. I will add a config flag to enable/disable smap based computation (default is disabled). I will upload the patch soon. - To get realistic RSS of a process, we need Private_Clean + Private_Dirty + Shared_Dirty of the process (i.e shared_dirty/n where n is the number of processes sharing). If we sum up all Shared_Dirty of memory regions, we would be double counting across processes. If we take PSS, then we would end up counting Shared_Clean as well. So closer approximation of real shared_dirty can be obtained by doing Min(shared_dirty, PSS) = Min (shared_dirty, (shared_dirty+shared_clean)/n); This will fall somewhere between shared_dirty shared_dirty/n. Hence, closer approximation of RSS = Math.min(sharedDirty, pss) + privateDirty + privateClean. [~cnauroth] Also interesting would be confirming that containers still get killed for exceeding the limit with private/non-shared pages. Had a offline discussion with Gopal. Based on the suggestion, I tried to ByteBuffer.allocateDirect() which would create enough PRIVATE_DIRTY in the process. The process got killed once it exceeded the physical memory limits of the container. Create SMAPBasedProcessTree to get PSS information -- Key: YARN-1775 URL: https://issues.apache.org/jira/browse/YARN-1775 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Rajesh Balamohan Assignee: Rajesh Balamohan Attachments: YARN-1775-v2.patch, yarn-1775-2.4.0.patch Create SMAPBasedProcessTree (by extending ProcfsBasedProcessTree), which will make use of PSS for computing the memory usage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1775) Create SMAPBasedProcessTree to get PSS information
[ https://issues.apache.org/jira/browse/YARN-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated YARN-1775: --- Attachment: YARN-1775-v2.patch Create SMAPBasedProcessTree to get PSS information -- Key: YARN-1775 URL: https://issues.apache.org/jira/browse/YARN-1775 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Rajesh Balamohan Assignee: Rajesh Balamohan Attachments: YARN-1775-v2.patch, yarn-1775-2.4.0.patch Create SMAPBasedProcessTree (by extending ProcfsBasedProcessTree), which will make use of PSS for computing the memory usage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1775) Create SMAPBasedProcessTree to get PSS information
[ https://issues.apache.org/jira/browse/YARN-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated YARN-1775: --- Attachment: YARN-1775-v3.patch Attaching correct patch. Create SMAPBasedProcessTree to get PSS information -- Key: YARN-1775 URL: https://issues.apache.org/jira/browse/YARN-1775 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Rajesh Balamohan Assignee: Rajesh Balamohan Attachments: YARN-1775-v2.patch, YARN-1775-v3.patch, yarn-1775-2.4.0.patch Create SMAPBasedProcessTree (by extending ProcfsBasedProcessTree), which will make use of PSS for computing the memory usage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1775) Create SMAPBasedProcessTree to get PSS information
[ https://issues.apache.org/jira/browse/YARN-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated YARN-1775: --- Attachment: yarn-1775-2.4.0.patch Computes the RSS by reading /proc/pid/smaps. Tested with branch 2.4.0 on 20 node cluster. Create SMAPBasedProcessTree to get PSS information -- Key: YARN-1775 URL: https://issues.apache.org/jira/browse/YARN-1775 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Rajesh Balamohan Priority: Minor Attachments: yarn-1775-2.4.0.patch Create SMAPBasedProcessTree (by extending ProcfsBasedProcessTree), which will make use of PSS for computing the memory usage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1775) Create SMAPBasedProcessTree to get PSS information
[ https://issues.apache.org/jira/browse/YARN-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934829#comment-13934829 ] Rajesh Balamohan commented on YARN-1775: Review request link : https://reviews.apache.org/r/19220/ Create SMAPBasedProcessTree to get PSS information -- Key: YARN-1775 URL: https://issues.apache.org/jira/browse/YARN-1775 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Rajesh Balamohan Priority: Minor Attachments: yarn-1775-2.4.0.patch Create SMAPBasedProcessTree (by extending ProcfsBasedProcessTree), which will make use of PSS for computing the memory usage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1775) Create SMAPBasedProcessTree to get PSS information
[ https://issues.apache.org/jira/browse/YARN-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated YARN-1775: --- Fix Version/s: 2.5.0 Create SMAPBasedProcessTree to get PSS information -- Key: YARN-1775 URL: https://issues.apache.org/jira/browse/YARN-1775 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Rajesh Balamohan Priority: Minor Fix For: 2.5.0 Attachments: yarn-1775-2.4.0.patch Create SMAPBasedProcessTree (by extending ProcfsBasedProcessTree), which will make use of PSS for computing the memory usage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-1775) Create SMAPBasedProcessTree to get PSS information
Rajesh Balamohan created YARN-1775: -- Summary: Create SMAPBasedProcessTree to get PSS information Key: YARN-1775 URL: https://issues.apache.org/jira/browse/YARN-1775 Project: Hadoop YARN Issue Type: Sub-task Reporter: Rajesh Balamohan Priority: Minor Create SMAPBasedProcessTree (by extending ProcfsBasedProcessTree), which will make use of PSS for computing the memory usage. -- This message was sent by Atlassian JIRA (v6.2#6252)