[jira] [Created] (YARN-11039) LogAggregationFileControllerFactory::getFileControllerForRead should close FS

2021-12-08 Thread Rajesh Balamohan (Jira)
Rajesh Balamohan created YARN-11039:
---

 Summary: 
LogAggregationFileControllerFactory::getFileControllerForRead should close FS 
 Key: YARN-11039
 URL: https://issues.apache.org/jira/browse/YARN-11039
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: log-aggregation
Reporter: Rajesh Balamohan


getFileControllerForRead::getFileControllerForRead internally opens up a new FS 
object everytime and is not closed.

When cloud connectors (e.g s3a) is used along with Knox, it ends up leaking 
KnoxTokenMonitor for every unclosed FS object causing thread leaks in NM.

Lines of interest:

[https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/filecontroller/LogAggregationFileControllerFactory.java#L167]
{noformat}
   try {
  Path remoteAppLogDir = fileController.getOlderRemoteAppLogDir(appId,
  appOwner);
  if (LogAggregationUtils.getNodeFiles(conf, remoteAppLogDir, appId,
  appOwner).hasNext()) {
return fileController;
  }
} catch (Exception ex) {
  diagnosticsMsg.append(ex.getMessage() + "\n");
  continue;
}
{noformat}
[https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogAggregationUtils.java#L252]
{noformat}

  public static RemoteIterator getNodeFiles(Configuration conf,
  Path remoteAppLogDir, ApplicationId appId, String appOwner)
  throws IOException {
Path qualifiedLogDir =
FileContext.getFileContext(conf).makeQualified(remoteAppLogDir);
return FileContext.getFileContext(
qualifiedLogDir.toUri(), conf).listStatus(remoteAppLogDir);
  }
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5764) NUMA awareness support for launching containers

2017-04-10 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15962766#comment-15962766
 ] 

Rajesh Balamohan commented on YARN-5764:


[~devaraj.k] Thank you for sharing the patch and the results.  Recent JVM 
versions have {{-XX:useNUMA}} (java -XX:+PrintFlagsFinal | grep useNUMA). 
Enabling this would instruct JVM to be NUMA aware and GC can take advantage of 
this fact. 

Was this flag ({{-XX:useNUMA}}) enabled in the tasks when running the 
benchmark? 

Hive on MR is outdated, network intensive and slow. It would be great, if BB 
benchmark can be run with Hive on Tez which optimizes queries to a great 
extent. It has much better resource utilization and also elimiates a lot of IO 
barriers and would be a lot efficient than MR codebase.

> NUMA awareness support for launching containers
> ---
>
> Key: YARN-5764
> URL: https://issues.apache.org/jira/browse/YARN-5764
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, yarn
>Reporter: Olasoji
>Assignee: Devaraj K
> Attachments: NUMA Awareness for YARN Containers.pdf, NUMA Performance 
> Results.pdf, YARN-5764-v0.patch, YARN-5764-v1.patch, YARN-5764-v2.patch
>
>
> The purpose of this feature is to improve Hadoop performance by minimizing 
> costly remote memory accesses on non SMP systems. Yarn containers, on launch, 
> will be pinned to a specific NUMA node and all subsequent memory allocations 
> will be served by the same node, reducing remote memory accesses. The current 
> default behavior is to spread memory across all NUMA nodes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6017) node manager physical memory leak

2016-12-23 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15772254#comment-15772254
 ] 

Rajesh Balamohan commented on YARN-6017:


No. from JVM's accounting perspective it is still at 2048 as per the logs. But 
need to check if it is anything to do with JVM's internal code itself or netty. 
 Have you tried with other JDK versions?

> node manager physical memory leak
> -
>
> Key: YARN-6017
> URL: https://issues.apache.org/jira/browse/YARN-6017
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.1
> Environment: OS:
> Linux guomai124041 2.6.32-431.el6.x86_64 #1 SMP Fri Nov 22 03:15:09 UTC 2013 
> x86_64 x86_64 x86_64 GNU/Linux
> jvm:
> java version "1.7.0_65"
> Java(TM) SE Runtime Environment (build 1.7.0_65-b17)
> Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
>Reporter: chenrongwei
> Attachments: 31169_smaps.txt, 31169_smaps.txt
>
>
> In our produce environment, node manager's jvm memory has been set to 
> '-Xmx2048m',but we notice that after a long time running the process' actual 
> physical memory size had been reached to 12g (we got this value by top 
> command as follow).
>   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
> 31169 data  20   0 13.2g  12g 6092 S 16.9 13.0  49183:13 java
> 31169:   /usr/local/jdk/bin/java -Dproc_nodemanager -Xmx2048m 
> -Dhadoop.log.dir=/home/data/programs/apache-hadoop-2.7.1/logs 
> -Dyarn.log.dir=/home/data/programs/apache-hadoop-2.7.1/logs 
> -Dhadoop.log.file=yarn-data-nodemanager.log 
> -Dyarn.log.file=yarn-data-nodemanager.log -Dyarn.home.dir= -Dyarn.id.str=data 
> -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA 
> -Djava.library.path=/home/data/programs/apache-hadoop-2.7.1/lib/native 
> -Dyarn.policy.file=hadoop-policy.xml -XX:PermSize=128M -XX:MaxPermSize=256M 
> -XX:+UseC
> Address   Kbytes Mode  Offset   DeviceMapping
> 0040   4 r-x--  008:1 java
> 0060   4 rw---  008:1 java
> 00601000 10094936 rw---  000:0   [ anon ]
> 00077000 2228224 rw---  000:0   [ anon ]
> 0007f800  131072 rw---  000:0   [ anon ]
> 00325ee0 128 r-x--  008:1 ld-2.12.so
> 00325f01f000   4 r 0001f000 008:1 ld-2.12.so
> 00325f02   4 rw--- 0002 008:1 ld-2.12.so
> 00325f021000   4 rw---  000:0   [ anon ]
> 00325f201576 r-x--  008:1 libc-2.12.so
> 00325f38a0002048 - 0018a000 008:1 libc-2.12.so
> 00325f58a000  16 r 0018a000 008:1 libc-2.12.so
> 00325f58e000   4 rw--- 0018e000 008:1 libc-2.12.so
> 00325f58f000  20 rw---  000:0   [ anon ]
> 00325f60  92 r-x--  008:1 libpthread-2.12.so
> 00325f6170002048 - 00017000 008:1 libpthread-2.12.so
> 00325f817000   4 r 00017000 008:1 libpthread-2.12.so
> 00325f818000   4 rw--- 00018000 008:1 libpthread-2.12.so
> 00325f819000  16 rw---  000:0   [ anon ]
> 00325fa0   8 r-x--  008:1 libdl-2.12.so
> 00325fa020002048 - 2000 008:1 libdl-2.12.so
> 00325fc02000   4 r 2000 008:1 libdl-2.12.so
> 00325fc03000   4 rw--- 3000 008:1 libdl-2.12.so
> 00325fe0  28 r-x--  008:1 librt-2.12.so
> 00325fe070002044 - 7000 008:1 librt-2.12.so
> 003260006000   4 r 6000 008:1 librt-2.12.so
> 003260007000   4 rw--- 7000 008:1 librt-2.12.so
> 00326020 524 r-x--  008:1 libm-2.12.so
> 0032602830002044 - 00083000 008:1 libm-2.12.so
> 003260482000   4 r 00082000 008:1 libm-2.12.so
> 003260483000   4 rw--- 00083000 008:1 libm-2.12.so
> 00326120  88 r-x--  008:1 libresolv-2.12.so
> 0032612160002048 - 00016000 008:1 libresolv-2.12.so
> 003261416000   4 r 00016000 008:1 libresolv-2.12.so
> 003261417000   4 rw--- 00017000 008:1 libresolv-2.12.so



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6017) node manager physical memory leak

2016-12-22 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15772212#comment-15772212
 ] 

Rajesh Balamohan commented on YARN-6017:


"10193716 in [heap]" in smaps indicates that it is from native side.

> node manager physical memory leak
> -
>
> Key: YARN-6017
> URL: https://issues.apache.org/jira/browse/YARN-6017
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.1
> Environment: OS:
> Linux guomai124041 2.6.32-431.el6.x86_64 #1 SMP Fri Nov 22 03:15:09 UTC 2013 
> x86_64 x86_64 x86_64 GNU/Linux
> jvm:
> java version "1.7.0_65"
> Java(TM) SE Runtime Environment (build 1.7.0_65-b17)
> Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
>Reporter: chenrongwei
> Attachments: 31169_smaps.txt, 31169_smaps.txt
>
>
> In our produce environment, node manager's jvm memory has been set to 
> '-Xmx2048m',but we notice that after a long time running the process' actual 
> physical memory size had been reached to 12g (we got this value by top 
> command as follow).
>   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
> 31169 data  20   0 13.2g  12g 6092 S 16.9 13.0  49183:13 java
> 31169:   /usr/local/jdk/bin/java -Dproc_nodemanager -Xmx2048m 
> -Dhadoop.log.dir=/home/data/programs/apache-hadoop-2.7.1/logs 
> -Dyarn.log.dir=/home/data/programs/apache-hadoop-2.7.1/logs 
> -Dhadoop.log.file=yarn-data-nodemanager.log 
> -Dyarn.log.file=yarn-data-nodemanager.log -Dyarn.home.dir= -Dyarn.id.str=data 
> -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA 
> -Djava.library.path=/home/data/programs/apache-hadoop-2.7.1/lib/native 
> -Dyarn.policy.file=hadoop-policy.xml -XX:PermSize=128M -XX:MaxPermSize=256M 
> -XX:+UseC
> Address   Kbytes Mode  Offset   DeviceMapping
> 0040   4 r-x--  008:1 java
> 0060   4 rw---  008:1 java
> 00601000 10094936 rw---  000:0   [ anon ]
> 00077000 2228224 rw---  000:0   [ anon ]
> 0007f800  131072 rw---  000:0   [ anon ]
> 00325ee0 128 r-x--  008:1 ld-2.12.so
> 00325f01f000   4 r 0001f000 008:1 ld-2.12.so
> 00325f02   4 rw--- 0002 008:1 ld-2.12.so
> 00325f021000   4 rw---  000:0   [ anon ]
> 00325f201576 r-x--  008:1 libc-2.12.so
> 00325f38a0002048 - 0018a000 008:1 libc-2.12.so
> 00325f58a000  16 r 0018a000 008:1 libc-2.12.so
> 00325f58e000   4 rw--- 0018e000 008:1 libc-2.12.so
> 00325f58f000  20 rw---  000:0   [ anon ]
> 00325f60  92 r-x--  008:1 libpthread-2.12.so
> 00325f6170002048 - 00017000 008:1 libpthread-2.12.so
> 00325f817000   4 r 00017000 008:1 libpthread-2.12.so
> 00325f818000   4 rw--- 00018000 008:1 libpthread-2.12.so
> 00325f819000  16 rw---  000:0   [ anon ]
> 00325fa0   8 r-x--  008:1 libdl-2.12.so
> 00325fa020002048 - 2000 008:1 libdl-2.12.so
> 00325fc02000   4 r 2000 008:1 libdl-2.12.so
> 00325fc03000   4 rw--- 3000 008:1 libdl-2.12.so
> 00325fe0  28 r-x--  008:1 librt-2.12.so
> 00325fe070002044 - 7000 008:1 librt-2.12.so
> 003260006000   4 r 6000 008:1 librt-2.12.so
> 003260007000   4 rw--- 7000 008:1 librt-2.12.so
> 00326020 524 r-x--  008:1 libm-2.12.so
> 0032602830002044 - 00083000 008:1 libm-2.12.so
> 003260482000   4 r 00082000 008:1 libm-2.12.so
> 003260483000   4 rw--- 00083000 008:1 libm-2.12.so
> 00326120  88 r-x--  008:1 libresolv-2.12.so
> 0032612160002048 - 00016000 008:1 libresolv-2.12.so
> 003261416000   4 r 00016000 008:1 libresolv-2.12.so
> 003261417000   4 rw--- 00017000 008:1 libresolv-2.12.so



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6017) node manager physical memory leak

2016-12-22 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15772116#comment-15772116
 ] 

Rajesh Balamohan commented on YARN-6017:


Usage seems to be from native side. Can you also post "/proc/31169/smaps" ?

> node manager physical memory leak
> -
>
> Key: YARN-6017
> URL: https://issues.apache.org/jira/browse/YARN-6017
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.1
> Environment: OS:
> Linux guomai124041 2.6.32-431.el6.x86_64 #1 SMP Fri Nov 22 03:15:09 UTC 2013 
> x86_64 x86_64 x86_64 GNU/Linux
> jvm:
> java version "1.7.0_65"
> Java(TM) SE Runtime Environment (build 1.7.0_65-b17)
> Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
>Reporter: chenrongwei
>
> In our produce environment, node manager's jvm memory has been set to 
> '-Xmx2048m',but we notice that after a long time running the process' actual 
> physical memory size had been reached to 12g (we got this value by top 
> command as follow).
>   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
> 31169 data  20   0 13.2g  12g 6092 S 16.9 13.0  49183:13 java
> 31169:   /usr/local/jdk/bin/java -Dproc_nodemanager -Xmx2048m 
> -Dhadoop.log.dir=/home/data/programs/apache-hadoop-2.7.1/logs 
> -Dyarn.log.dir=/home/data/programs/apache-hadoop-2.7.1/logs 
> -Dhadoop.log.file=yarn-data-nodemanager.log 
> -Dyarn.log.file=yarn-data-nodemanager.log -Dyarn.home.dir= -Dyarn.id.str=data 
> -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA 
> -Djava.library.path=/home/data/programs/apache-hadoop-2.7.1/lib/native 
> -Dyarn.policy.file=hadoop-policy.xml -XX:PermSize=128M -XX:MaxPermSize=256M 
> -XX:+UseC
> Address   Kbytes Mode  Offset   DeviceMapping
> 0040   4 r-x--  008:1 java
> 0060   4 rw---  008:1 java
> 00601000 10094936 rw---  000:0   [ anon ]
> 00077000 2228224 rw---  000:0   [ anon ]
> 0007f800  131072 rw---  000:0   [ anon ]
> 00325ee0 128 r-x--  008:1 ld-2.12.so
> 00325f01f000   4 r 0001f000 008:1 ld-2.12.so
> 00325f02   4 rw--- 0002 008:1 ld-2.12.so
> 00325f021000   4 rw---  000:0   [ anon ]
> 00325f201576 r-x--  008:1 libc-2.12.so
> 00325f38a0002048 - 0018a000 008:1 libc-2.12.so
> 00325f58a000  16 r 0018a000 008:1 libc-2.12.so
> 00325f58e000   4 rw--- 0018e000 008:1 libc-2.12.so
> 00325f58f000  20 rw---  000:0   [ anon ]
> 00325f60  92 r-x--  008:1 libpthread-2.12.so
> 00325f6170002048 - 00017000 008:1 libpthread-2.12.so
> 00325f817000   4 r 00017000 008:1 libpthread-2.12.so
> 00325f818000   4 rw--- 00018000 008:1 libpthread-2.12.so
> 00325f819000  16 rw---  000:0   [ anon ]
> 00325fa0   8 r-x--  008:1 libdl-2.12.so
> 00325fa020002048 - 2000 008:1 libdl-2.12.so
> 00325fc02000   4 r 2000 008:1 libdl-2.12.so
> 00325fc03000   4 rw--- 3000 008:1 libdl-2.12.so
> 00325fe0  28 r-x--  008:1 librt-2.12.so
> 00325fe070002044 - 7000 008:1 librt-2.12.so
> 003260006000   4 r 6000 008:1 librt-2.12.so
> 003260007000   4 rw--- 7000 008:1 librt-2.12.so
> 00326020 524 r-x--  008:1 libm-2.12.so
> 0032602830002044 - 00083000 008:1 libm-2.12.so
> 003260482000   4 r 00082000 008:1 libm-2.12.so
> 003260483000   4 rw--- 00083000 008:1 libm-2.12.so
> 00326120  88 r-x--  008:1 libresolv-2.12.so
> 0032612160002048 - 00016000 008:1 libresolv-2.12.so
> 003261416000   4 r 00016000 008:1 libresolv-2.12.so
> 003261417000   4 rw--- 00017000 008:1 libresolv-2.12.so



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6017) node manager physical memory leak

2016-12-22 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15771984#comment-15771984
 ] 

Rajesh Balamohan commented on YARN-6017:


Can you share the details of 
{noformat}
jmap -heap 
{noformat}

Can you also get the heapdump if the proc is still alive?

{noformat}
jmap -dump:format=b,file=/tmp/nm.hprof 
{noformat}



> node manager physical memory leak
> -
>
> Key: YARN-6017
> URL: https://issues.apache.org/jira/browse/YARN-6017
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.1
> Environment: OS:
> Linux guomai124041 2.6.32-431.el6.x86_64 #1 SMP Fri Nov 22 03:15:09 UTC 2013 
> x86_64 x86_64 x86_64 GNU/Linux
> jvm:
> java version "1.7.0_65"
> Java(TM) SE Runtime Environment (build 1.7.0_65-b17)
> Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
>Reporter: chenrongwei
>
> In our produce environment, node manager's jvm memory has been set to 
> '-Xmx2048m',but we notice that after a long time running the process' actual 
> physical memory size had been reached to 12g (we got this value by top 
> command as follow).
>   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
> 31169 data  20   0 13.2g  12g 6092 S 16.9 13.0  49183:13 java
> 31169:   /usr/local/jdk/bin/java -Dproc_nodemanager -Xmx2048m 
> -Dhadoop.log.dir=/home/data/programs/apache-hadoop-2.7.1/logs 
> -Dyarn.log.dir=/home/data/programs/apache-hadoop-2.7.1/logs 
> -Dhadoop.log.file=yarn-data-nodemanager.log 
> -Dyarn.log.file=yarn-data-nodemanager.log -Dyarn.home.dir= -Dyarn.id.str=data 
> -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA 
> -Djava.library.path=/home/data/programs/apache-hadoop-2.7.1/lib/native 
> -Dyarn.policy.file=hadoop-policy.xml -XX:PermSize=128M -XX:MaxPermSize=256M 
> -XX:+UseC
> Address   Kbytes Mode  Offset   DeviceMapping
> 0040   4 r-x--  008:1 java
> 0060   4 rw---  008:1 java
> 00601000 10094936 rw---  000:0   [ anon ]
> 00077000 2228224 rw---  000:0   [ anon ]
> 0007f800  131072 rw---  000:0   [ anon ]
> 00325ee0 128 r-x--  008:1 ld-2.12.so
> 00325f01f000   4 r 0001f000 008:1 ld-2.12.so
> 00325f02   4 rw--- 0002 008:1 ld-2.12.so
> 00325f021000   4 rw---  000:0   [ anon ]
> 00325f201576 r-x--  008:1 libc-2.12.so
> 00325f38a0002048 - 0018a000 008:1 libc-2.12.so
> 00325f58a000  16 r 0018a000 008:1 libc-2.12.so
> 00325f58e000   4 rw--- 0018e000 008:1 libc-2.12.so
> 00325f58f000  20 rw---  000:0   [ anon ]
> 00325f60  92 r-x--  008:1 libpthread-2.12.so
> 00325f6170002048 - 00017000 008:1 libpthread-2.12.so
> 00325f817000   4 r 00017000 008:1 libpthread-2.12.so
> 00325f818000   4 rw--- 00018000 008:1 libpthread-2.12.so
> 00325f819000  16 rw---  000:0   [ anon ]
> 00325fa0   8 r-x--  008:1 libdl-2.12.so
> 00325fa020002048 - 2000 008:1 libdl-2.12.so
> 00325fc02000   4 r 2000 008:1 libdl-2.12.so
> 00325fc03000   4 rw--- 3000 008:1 libdl-2.12.so
> 00325fe0  28 r-x--  008:1 librt-2.12.so
> 00325fe070002044 - 7000 008:1 librt-2.12.so
> 003260006000   4 r 6000 008:1 librt-2.12.so
> 003260007000   4 rw--- 7000 008:1 librt-2.12.so
> 00326020 524 r-x--  008:1 libm-2.12.so
> 0032602830002044 - 00083000 008:1 libm-2.12.so
> 003260482000   4 r 00082000 008:1 libm-2.12.so
> 003260483000   4 rw--- 00083000 008:1 libm-2.12.so
> 00326120  88 r-x--  008:1 libresolv-2.12.so
> 0032612160002048 - 00016000 008:1 libresolv-2.12.so
> 003261416000   4 r 00016000 008:1 libresolv-2.12.so
> 003261417000   4 rw--- 00017000 008:1 libresolv-2.12.so



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5551) Ignore file backed pages from memory computation when smaps is enabled

2016-10-11 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated YARN-5551:
---
Attachment: YARN-5551.branch-2.003.patch

Rebasing to address the checkstyle issues.

> Ignore file backed pages from memory computation when smaps is enabled
> --
>
> Key: YARN-5551
> URL: https://issues.apache.org/jira/browse/YARN-5551
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: YARN-5551.branch-2.001.patch, 
> YARN-5551.branch-2.002.patch, YARN-5551.branch-2.003.patch
>
>
> Currently deleted file mappings are also included in the memory computation 
> when SMAP is enabled. For e.g
> {noformat}
> 7f612004a000-7f612004c000 rw-s  00:10 4201507513 
> /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-521969216_162_734673185
>  (deleted)
> Size:  8 kB
> Rss:   4 kB
> Pss:   2 kB
> Shared_Clean:  0 kB
> Shared_Dirty:  4 kB
> Private_Clean: 0 kB
> Private_Dirty: 0 kB
> Referenced:4 kB
> Anonymous: 0 kB
> AnonHugePages: 0 kB
> Swap:  0 kB
> KernelPageSize:4 kB
> MMUPageSize:   4 kB
> 7fbf2800-7fbf6800 rw-s  08:02 11927571   
> /tmp/7298569189125604642/arena-1291157252088664681.cache (deleted)
> Size:1048576 kB
> Rss:   17288 kB
> Pss:   17288 kB
> Shared_Clean:  0 kB
> Shared_Dirty:  0 kB
> Private_Clean:   232 kB
> Private_Dirty: 17056 kB
> Referenced:17288 kB
> Anonymous: 0 kB
> AnonHugePages: 0 kB
> Swap:  0 kB
> KernelPageSize:4 kB
> MMUPageSize:   4 kB
> {noformat}
> It would be good to exclude these from getSmapBasedRssMemorySize() 
> computation.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5551) Ignore deleted file mapping from memory computation when smaps is enabled

2016-09-22 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15515481#comment-15515481
 ] 

Rajesh Balamohan edited comment on YARN-5551 at 9/23/16 5:43 AM:
-

Attaching .2 version which takes into account "anonymous" pages.  


was (Author: rajesh.balamohan):
Attaching .2 version which takes into account "anonymous".  

> Ignore deleted file mapping from memory computation when smaps is enabled
> -
>
> Key: YARN-5551
> URL: https://issues.apache.org/jira/browse/YARN-5551
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: YARN-5551.branch-2.001.patch, 
> YARN-5551.branch-2.002.patch
>
>
> Currently deleted file mappings are also included in the memory computation 
> when SMAP is enabled. For e.g
> {noformat}
> 7f612004a000-7f612004c000 rw-s  00:10 4201507513 
> /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-521969216_162_734673185
>  (deleted)
> Size:  8 kB
> Rss:   4 kB
> Pss:   2 kB
> Shared_Clean:  0 kB
> Shared_Dirty:  4 kB
> Private_Clean: 0 kB
> Private_Dirty: 0 kB
> Referenced:4 kB
> Anonymous: 0 kB
> AnonHugePages: 0 kB
> Swap:  0 kB
> KernelPageSize:4 kB
> MMUPageSize:   4 kB
> 7fbf2800-7fbf6800 rw-s  08:02 11927571   
> /tmp/7298569189125604642/arena-1291157252088664681.cache (deleted)
> Size:1048576 kB
> Rss:   17288 kB
> Pss:   17288 kB
> Shared_Clean:  0 kB
> Shared_Dirty:  0 kB
> Private_Clean:   232 kB
> Private_Dirty: 17056 kB
> Referenced:17288 kB
> Anonymous: 0 kB
> AnonHugePages: 0 kB
> Swap:  0 kB
> KernelPageSize:4 kB
> MMUPageSize:   4 kB
> {noformat}
> It would be good to exclude these from getSmapBasedRssMemorySize() 
> computation.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5551) Ignore deleted file mapping from memory computation when smaps is enabled

2016-09-22 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated YARN-5551:
---
Attachment: YARN-5551.branch-2.002.patch

Attaching .2 version which takes into account "anonymous".  

> Ignore deleted file mapping from memory computation when smaps is enabled
> -
>
> Key: YARN-5551
> URL: https://issues.apache.org/jira/browse/YARN-5551
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: YARN-5551.branch-2.001.patch, 
> YARN-5551.branch-2.002.patch
>
>
> Currently deleted file mappings are also included in the memory computation 
> when SMAP is enabled. For e.g
> {noformat}
> 7f612004a000-7f612004c000 rw-s  00:10 4201507513 
> /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-521969216_162_734673185
>  (deleted)
> Size:  8 kB
> Rss:   4 kB
> Pss:   2 kB
> Shared_Clean:  0 kB
> Shared_Dirty:  4 kB
> Private_Clean: 0 kB
> Private_Dirty: 0 kB
> Referenced:4 kB
> Anonymous: 0 kB
> AnonHugePages: 0 kB
> Swap:  0 kB
> KernelPageSize:4 kB
> MMUPageSize:   4 kB
> 7fbf2800-7fbf6800 rw-s  08:02 11927571   
> /tmp/7298569189125604642/arena-1291157252088664681.cache (deleted)
> Size:1048576 kB
> Rss:   17288 kB
> Pss:   17288 kB
> Shared_Clean:  0 kB
> Shared_Dirty:  0 kB
> Private_Clean:   232 kB
> Private_Dirty: 17056 kB
> Referenced:17288 kB
> Anonymous: 0 kB
> AnonHugePages: 0 kB
> Swap:  0 kB
> KernelPageSize:4 kB
> MMUPageSize:   4 kB
> {noformat}
> It would be good to exclude these from getSmapBasedRssMemorySize() 
> computation.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5551) Ignore deleted file mapping from memory computation when smaps is enabled

2016-08-25 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15438057#comment-15438057
 ] 

Rajesh Balamohan edited comment on YARN-5551 at 8/25/16 11:37 PM:
--

This patch worked for the scenario we ran into. 

If memory mapping of a file is anon=0, should that cause the process to be 
killed?

A more generic patch would be figure out whether memory mapping with annon=0 
should be deciding factor for killing the process.


was (Author: rajesh.balamohan):
This patch worked for the scenario we ran into. 

If memory mapping of a file is anon=0, should that cause the process to be 
killed. 

A more generic patch would be figure out whether memory mapping with annon=0 
should be deciding factor for killing the process.

> Ignore deleted file mapping from memory computation when smaps is enabled
> -
>
> Key: YARN-5551
> URL: https://issues.apache.org/jira/browse/YARN-5551
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: YARN-5551.branch-2.001.patch
>
>
> Currently deleted file mappings are also included in the memory computation 
> when SMAP is enabled. For e.g
> {noformat}
> 7f612004a000-7f612004c000 rw-s  00:10 4201507513 
> /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-521969216_162_734673185
>  (deleted)
> Size:  8 kB
> Rss:   4 kB
> Pss:   2 kB
> Shared_Clean:  0 kB
> Shared_Dirty:  4 kB
> Private_Clean: 0 kB
> Private_Dirty: 0 kB
> Referenced:4 kB
> Anonymous: 0 kB
> AnonHugePages: 0 kB
> Swap:  0 kB
> KernelPageSize:4 kB
> MMUPageSize:   4 kB
> 7fbf2800-7fbf6800 rw-s  08:02 11927571   
> /tmp/7298569189125604642/arena-1291157252088664681.cache (deleted)
> Size:1048576 kB
> Rss:   17288 kB
> Pss:   17288 kB
> Shared_Clean:  0 kB
> Shared_Dirty:  0 kB
> Private_Clean:   232 kB
> Private_Dirty: 17056 kB
> Referenced:17288 kB
> Anonymous: 0 kB
> AnonHugePages: 0 kB
> Swap:  0 kB
> KernelPageSize:4 kB
> MMUPageSize:   4 kB
> {noformat}
> It would be good to exclude these from getSmapBasedRssMemorySize() 
> computation.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5551) Ignore deleted file mapping from memory computation when smaps is enabled

2016-08-25 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15438057#comment-15438057
 ] 

Rajesh Balamohan commented on YARN-5551:


This patch worked for the scenario we ran into. 

If memory mapping of a file is anon=0, should that cause the process to be 
killed. 

A more generic patch would be figure out whether memory mapping with annon=0 
should be deciding factor for killing the process.

> Ignore deleted file mapping from memory computation when smaps is enabled
> -
>
> Key: YARN-5551
> URL: https://issues.apache.org/jira/browse/YARN-5551
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: YARN-5551.branch-2.001.patch
>
>
> Currently deleted file mappings are also included in the memory computation 
> when SMAP is enabled. For e.g
> {noformat}
> 7f612004a000-7f612004c000 rw-s  00:10 4201507513 
> /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-521969216_162_734673185
>  (deleted)
> Size:  8 kB
> Rss:   4 kB
> Pss:   2 kB
> Shared_Clean:  0 kB
> Shared_Dirty:  4 kB
> Private_Clean: 0 kB
> Private_Dirty: 0 kB
> Referenced:4 kB
> Anonymous: 0 kB
> AnonHugePages: 0 kB
> Swap:  0 kB
> KernelPageSize:4 kB
> MMUPageSize:   4 kB
> 7fbf2800-7fbf6800 rw-s  08:02 11927571   
> /tmp/7298569189125604642/arena-1291157252088664681.cache (deleted)
> Size:1048576 kB
> Rss:   17288 kB
> Pss:   17288 kB
> Shared_Clean:  0 kB
> Shared_Dirty:  0 kB
> Private_Clean:   232 kB
> Private_Dirty: 17056 kB
> Referenced:17288 kB
> Anonymous: 0 kB
> AnonHugePages: 0 kB
> Swap:  0 kB
> KernelPageSize:4 kB
> MMUPageSize:   4 kB
> {noformat}
> It would be good to exclude these from getSmapBasedRssMemorySize() 
> computation.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5551) Ignore deleted file mapping from memory computation when smaps is enabled

2016-08-22 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated YARN-5551:
---
Attachment: YARN-5551.branch-2.001.patch

> Ignore deleted file mapping from memory computation when smaps is enabled
> -
>
> Key: YARN-5551
> URL: https://issues.apache.org/jira/browse/YARN-5551
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: YARN-5551.branch-2.001.patch
>
>
> Currently deleted file mappings are also included in the memory computation 
> when SMAP is enabled. For e.g
> {noformat}
> 7f612004a000-7f612004c000 rw-s  00:10 4201507513 
> /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-521969216_162_734673185
>  (deleted)
> Size:  8 kB
> Rss:   4 kB
> Pss:   2 kB
> Shared_Clean:  0 kB
> Shared_Dirty:  4 kB
> Private_Clean: 0 kB
> Private_Dirty: 0 kB
> Referenced:4 kB
> Anonymous: 0 kB
> AnonHugePages: 0 kB
> Swap:  0 kB
> KernelPageSize:4 kB
> MMUPageSize:   4 kB
> 7f6123f99000-7f6163f99000 rw-p  08:41 211419477  
> /grid/4/hadoop/yarn/local/usercache/root/appcache/application_1466700718395_1249/container_e19_1466700718395_1249_01_03/7389389356021597290.cache
>  (deleted)
> Size:1048576 kB
> Rss:  637292 kB
> Pss:  637292 kB
> Shared_Clean:  0 kB
> Shared_Dirty:  0 kB
> Private_Clean: 0 kB
> Private_Dirty:637292 kB
> Referenced:   637292 kB
> Anonymous:637292 kB
> AnonHugePages: 0 kB
> Swap:  0 kB
> KernelPageSize:4 kB
> {noformat}
> It would be good to exclude these from getSmapBasedRssMemorySize() 
> computation.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5551) Ignore deleted file mapping from memory computation when smaps is enabled

2016-08-22 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated YARN-5551:
---
Target Version/s: 2.7.3

> Ignore deleted file mapping from memory computation when smaps is enabled
> -
>
> Key: YARN-5551
> URL: https://issues.apache.org/jira/browse/YARN-5551
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
>
> Currently deleted file mappings are also included in the memory computation 
> when SMAP is enabled. For e.g
> {noformat}
> 7f612004a000-7f612004c000 rw-s  00:10 4201507513 
> /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-521969216_162_734673185
>  (deleted)
> Size:  8 kB
> Rss:   4 kB
> Pss:   2 kB
> Shared_Clean:  0 kB
> Shared_Dirty:  4 kB
> Private_Clean: 0 kB
> Private_Dirty: 0 kB
> Referenced:4 kB
> Anonymous: 0 kB
> AnonHugePages: 0 kB
> Swap:  0 kB
> KernelPageSize:4 kB
> MMUPageSize:   4 kB
> 7f6123f99000-7f6163f99000 rw-p  08:41 211419477  
> /grid/4/hadoop/yarn/local/usercache/root/appcache/application_1466700718395_1249/container_e19_1466700718395_1249_01_03/7389389356021597290.cache
>  (deleted)
> Size:1048576 kB
> Rss:  637292 kB
> Pss:  637292 kB
> Shared_Clean:  0 kB
> Shared_Dirty:  0 kB
> Private_Clean: 0 kB
> Private_Dirty:637292 kB
> Referenced:   637292 kB
> Anonymous:637292 kB
> AnonHugePages: 0 kB
> Swap:  0 kB
> KernelPageSize:4 kB
> {noformat}
> It would be good to exclude these from getSmapBasedRssMemorySize() 
> computation.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5551) Ignore deleted file mapping from memory computation when smaps is enabled

2016-08-22 Thread Rajesh Balamohan (JIRA)
Rajesh Balamohan created YARN-5551:
--

 Summary: Ignore deleted file mapping from memory computation when 
smaps is enabled
 Key: YARN-5551
 URL: https://issues.apache.org/jira/browse/YARN-5551
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
Priority: Minor


Currently deleted file mappings are also included in the memory computation 
when SMAP is enabled. For e.g

{noformat}
7f612004a000-7f612004c000 rw-s  00:10 4201507513 
/dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-521969216_162_734673185 
(deleted)
Size:  8 kB
Rss:   4 kB
Pss:   2 kB
Shared_Clean:  0 kB
Shared_Dirty:  4 kB
Private_Clean: 0 kB
Private_Dirty: 0 kB
Referenced:4 kB
Anonymous: 0 kB
AnonHugePages: 0 kB
Swap:  0 kB
KernelPageSize:4 kB
MMUPageSize:   4 kB


7f6123f99000-7f6163f99000 rw-p  08:41 211419477  
/grid/4/hadoop/yarn/local/usercache/root/appcache/application_1466700718395_1249/container_e19_1466700718395_1249_01_03/7389389356021597290.cache
 (deleted)
Size:1048576 kB
Rss:  637292 kB
Pss:  637292 kB
Shared_Clean:  0 kB
Shared_Dirty:  0 kB
Private_Clean: 0 kB
Private_Dirty:637292 kB
Referenced:   637292 kB
Anonymous:637292 kB
AnonHugePages: 0 kB
Swap:  0 kB
KernelPageSize:4 kB
{noformat}

It would be good to exclude these from getSmapBasedRssMemorySize() computation. 
 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5296) NMs going OutOfMemory because ContainerMetrics leak in ContainerMonitorImpl

2016-06-27 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15351356#comment-15351356
 ] 

Rajesh Balamohan commented on YARN-5296:


Based on offline conversation with [~karams], i have changed assignee to 
[~djp]. 
\cc [~djp]

> NMs going OutOfMemory because ContainerMetrics leak in ContainerMonitorImpl
> ---
>
> Key: YARN-5296
> URL: https://issues.apache.org/jira/browse/YARN-5296
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.8.0, 2.9.0
>Reporter: Karam Singh
>Assignee: Junping Du
>
> Ran tests in following manner,
> 1. Run GridMix of 768 sequestionally around 17 times to execute about 12.9K 
> apps.
> 2. After 4-5hrs take Check NM Heap using Memory Analyser. It report around 
> 96% Heap is being used my ContainerMetrics
> 3. Run 7 more GridMix run for have around 18.2apps ran in total. Again check 
> NM heap using Memory Analyser again 96% heap is being used by 
> ContainerMetrics. 
> 4. Start one more grimdmix run, while run going on , NMs started going down 
> with OOM, around running 18.7K+, On analysing NM heap using Memory analyser, 
> OOM was caused by ContainerMetrics



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5296) NMs going OutOfMemory because ContainerMetrics leak in ContainerMonitorImpl

2016-06-27 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated YARN-5296:
---
Assignee: Junping Du

> NMs going OutOfMemory because ContainerMetrics leak in ContainerMonitorImpl
> ---
>
> Key: YARN-5296
> URL: https://issues.apache.org/jira/browse/YARN-5296
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.8.0, 2.9.0
>Reporter: Karam Singh
>Assignee: Junping Du
>
> Ran tests in following manner,
> 1. Run GridMix of 768 sequestionally around 17 times to execute about 12.9K 
> apps.
> 2. After 4-5hrs take Check NM Heap using Memory Analyser. It report around 
> 96% Heap is being used my ContainerMetrics
> 3. Run 7 more GridMix run for have around 18.2apps ran in total. Again check 
> NM heap using Memory Analyser again 96% heap is being used by 
> ContainerMetrics. 
> 4. Start one more grimdmix run, while run going on , NMs started going down 
> with OOM, around running 18.7K+, On analysing NM heap using Memory analyser, 
> OOM was caused by ContainerMetrics



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS

2015-08-17 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700657#comment-14700657
 ] 

Rajesh Balamohan commented on YARN-3942:


Should this be resilient to cluster restarts? For e.g, when cluster restart 
happens, timeline server automatically gets killed with the following exception.

{noformat}
2015-08-18 01:03:31,523 [EntityLogPluginWorker #6] ERROR 
org.apache.hadoop.yarn.server.timeline.EntityFileTimelineStore: Error scanning 
active files
...
...
[EntityLogPluginWorker #0] ERROR 
org.apache.hadoop.yarn.server.timeline.EntityFileTimelineStore: Error scanning 
active files
java.io.EOFException: End of File Exception between local host is: 
atsmachine; destination host is: m1:8020; : java.io.EOFException; For more 
details see:  http://wiki.apache.org/hadoop/EOFException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:765)
at org.apache.hadoop.ipc.Client.call(Client.java:1444)
at org.apache.hadoop.ipc.Client.call(Client.java:1371)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy26.getListing(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:574)
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
at com.sun.proxy.$Proxy27.getListing(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1748)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.init(DistributedFileSystem.java:973)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.init(DistributedFileSystem.java:984)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.init(DistributedFileSystem.java:956)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:935)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:931)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:943)
at 
org.apache.hadoop.yarn.server.timeline.EntityFileTimelineStore.scanActiveLogs(EntityFileTimelineStore.java:314)
at 
org.apache.hadoop.yarn.server.timeline.EntityFileTimelineStore.access$1300(EntityFileTimelineStore.java:79)
at 
org.apache.hadoop.yarn.server.timeline.EntityFileTimelineStore$EntityLogScanner.run(EntityFileTimelineStore.java:771)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at 
org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1098)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:993)
2015-08-18 01:03:35,600 [SIGTERM handler] ERROR 
org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer:
 RECEIVED SIGNAL 15: SIGTERM
2015-08-18 01:03:35,608 [Thread-1] INFO org.mortbay.log: Stopped 
HttpServer2$SelectChannelConnectorWithSafeStartup@atsmachine:8188
2015-08-18 01:03:35,710 [Thread-1] INFO 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping 
ApplicationHistoryServer metrics system...
2015-08-18 01:03:35,712 [Thread-1] INFO 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: 

[jira] [Created] (YARN-3797) NodeManager not blacklisting the disk (shuffle) with errors

2015-06-11 Thread Rajesh Balamohan (JIRA)
Rajesh Balamohan created YARN-3797:
--

 Summary: NodeManager not blacklisting the disk (shuffle) with 
errors
 Key: YARN-3797
 URL: https://issues.apache.org/jira/browse/YARN-3797
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Rajesh Balamohan


In a multi-node environment, one of the disk (where map outputs are written) in 
a node went bad. Errors are given below.

{noformat}
Info fld=0x9ad090a
sd 6:0:5:0: [sdf]  Add. Sense: Unrecovered read error
sd 6:0:5:0: [sdf] CDB: Read(10): 28 00 09 ad 09 08 00 00 08 00
end_request: critical medium error, dev sdf, sector 162334984
mpt2sas0: log_info(0x3108): originator(PL), code(0x08), sub_code(0x)
sd 6:0:5:0: [sdf]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 6:0:5:0: [sdf]  Sense Key : Medium Error [current]
Info fld=0x9af8892
sd 6:0:5:0: [sdf]  Add. Sense: Unrecovered read error
sd 6:0:5:0: [sdf] CDB: Read(10): 28 00 09 af 88 90 00 00 08 00
end_request: critical medium error, dev sdf, sector 162498704
mpt2sas0: log_info(0x3108): originator(PL), code(0x08), sub_code(0x)
mpt2sas0: log_info(0x3108): originator(PL), code(0x08), sub_code(0x)
sd 6:0:5:0: [sdf]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 6:0:5:0: [sdf]  Sense Key : Medium Error [current]
Info fld=0x9af8892
sd 6:0:5:0: [sdf]  Add. Sense: Unrecovered read error
sd 6:0:5:0: [sdf] CDB: Read(10): 28 00 09 af 88 90 00 00 08 00
end_request: critical medium error, dev sdf, sector 162498704
{noformat}

Diskchecker would pass as the system allows to create directories and delete 
directories without issues.  But data being served out can be corrupt and 
fetchers fail during CRC verification with unwanted delays and retries. 

Ideally node manager should detect such errors and blacklist/remove those disks 
from NM.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2314) ContainerManagementProtocolProxy can create thousands of threads for a large cluster

2014-10-16 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated YARN-2314:
---
Attachment: tez-yarn-2314.xlsx

Attaching the results of getProxy() call for tez with 20 nodes with this patch 
for different cache sizes and for different data sizes (tested a job @200GB and 
10 TB scale).  Overall, there is slight degradation in performance (in 
milliseconds) by setting cache size to 0, but not significant to make an impact 
in overall job runtime in tez.

 ContainerManagementProtocolProxy can create thousands of threads for a large 
 cluster
 

 Key: YARN-2314
 URL: https://issues.apache.org/jira/browse/YARN-2314
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.1.0-beta
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Critical
 Attachments: YARN-2314.patch, YARN-2314v2.patch, 
 disable-cm-proxy-cache.patch, nmproxycachefix.prototype.patch, 
 tez-yarn-2314.xlsx


 ContainerManagementProtocolProxy has a cache of NM proxies, and the size of 
 this cache is configurable.  However the cache can grow far beyond the 
 configured size when running on a large cluster and blow AM address/container 
 limits.  More details in the first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1775) Create SMAPBasedProcessTree to get PSS information

2014-03-21 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated YARN-1775:
---

Attachment: YARN-1775-v3.patch

 getSmapBasedCumulativeRssmem() should be private
-Fixed

 When converting #pages to bytes, use PAGE_SIZE instead of hard-coding 1024.
-smap information has KB which needs to be converted to bytes.  PAGE_SIZE 
mostly will be 4096 which will give wrong value in getSmapBasedCumulativeRssmem.

 Move the constant PROCFS_SMAPS_ENABLED to YarnConfiguration
-Fixed.

 Suggestions for renames
 PROCFS_SMAPS_ENABLED - PROCFS_USE_SMAPS_BASED_RSS
 yarn.nodemanager.container-monitor.process-tree.smaps.enabled - 
 yarn.nodemanager.container-monitor.procfs-based-proces-tree.smaps-based-rss.enabled.
  (Did I just say that?  )
-Fixed 
(yarn.nodemanager.container-monitor.procfs-tree.smaps-based-rss.enabled).  
Still long I believe.

 ProcessMemInfo - ProcessTreeSmapMemInfo?, MemoryMappingInfo - 
 ProcessSmapMemoryInfo, moduleMemList - memoryInfoList, processSMAPTree 
 should be cleared in every iteration of updating the process-tree
-Fixed

 isSmapEnabled() should be private
-Removed this method completely. As a part of setConf() call, smapEnabled is 
computed.

 MemoryMappingInfo.updateModuleMemInfo: We should skip everything else when 
 we run into integer parsing issue of the value. Right now you are logging, 
 ignoring and continuing.
-Fixed

Rename MEM_INFO to MemInfo to go with other enums in the source?
-Fixed

We should probably switch the following two ifs?
-Fixed

Javadoc error
-Fixed
Reformatted the testcase as well.

While enforcing memory constraints, I wonder if people would want to use any 
other definitions of RSS to be more conservative or aggressive. Do you think 
it would make sense to provide these options separately, and have what you 
have as the default? We can punt this to a different JIRA, just wanted to 
bring it up.
-This option can be provided as advanced/expert configuration. We can have a 
separate JIRA to track it separately. Please feel free to open a new JIRA.


 Create SMAPBasedProcessTree to get PSS information
 --

 Key: YARN-1775
 URL: https://issues.apache.org/jira/browse/YARN-1775
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Attachments: YARN-1775-v2.patch, YARN-1775-v3.patch, 
 YARN-1775-v3.patch, YARN-1775-v4.patch, yarn-1775-2.4.0.patch


 Create SMAPBasedProcessTree (by extending ProcfsBasedProcessTree), which will 
 make use of PSS for computing the memory usage. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1775) Create SMAPBasedProcessTree to get PSS information

2014-03-21 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated YARN-1775:
---

Attachment: YARN-1775-v4.patch

Renaming the patch as v4.

 Create SMAPBasedProcessTree to get PSS information
 --

 Key: YARN-1775
 URL: https://issues.apache.org/jira/browse/YARN-1775
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Attachments: YARN-1775-v2.patch, YARN-1775-v3.patch, 
 YARN-1775-v3.patch, YARN-1775-v4.patch, yarn-1775-2.4.0.patch


 Create SMAPBasedProcessTree (by extending ProcfsBasedProcessTree), which will 
 make use of PSS for computing the memory usage. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1775) Create SMAPBasedProcessTree to get PSS information

2014-03-20 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941725#comment-13941725
 ] 

Rajesh Balamohan commented on YARN-1775:


[~vinodkv]
- Sure, it would be great if this can be accommodated in ProcfsBasedProcessTree 
itself.  I will add a config flag to enable/disable smap based computation 
(default is disabled).  I will upload the patch soon.
- To get realistic RSS of a process, we need Private_Clean + Private_Dirty + 
Shared_Dirty of the process (i.e shared_dirty/n where n is the number of 
processes sharing).  If we sum up all Shared_Dirty of memory regions, we would 
be double counting across processes.  If we take PSS, then we would end up 
counting Shared_Clean as well.  So closer approximation of real shared_dirty 
can be obtained by doing Min(shared_dirty, PSS) = Min (shared_dirty, 
(shared_dirty+shared_clean)/n);  This will fall somewhere between shared_dirty 
 shared_dirty/n.  Hence, closer approximation of RSS = Math.min(sharedDirty, 
pss) + privateDirty + privateClean.

[~cnauroth]

Also interesting would be confirming that containers still get killed for 
exceeding the limit with private/non-shared pages.

Had a offline discussion with Gopal.  Based on the suggestion, I tried to 
ByteBuffer.allocateDirect() which would create enough PRIVATE_DIRTY in the 
process.  The process got killed once it exceeded the physical memory limits of 
the container.
  

 Create SMAPBasedProcessTree to get PSS information
 --

 Key: YARN-1775
 URL: https://issues.apache.org/jira/browse/YARN-1775
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Attachments: YARN-1775-v2.patch, yarn-1775-2.4.0.patch


 Create SMAPBasedProcessTree (by extending ProcfsBasedProcessTree), which will 
 make use of PSS for computing the memory usage. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1775) Create SMAPBasedProcessTree to get PSS information

2014-03-20 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated YARN-1775:
---

Attachment: YARN-1775-v2.patch

 Create SMAPBasedProcessTree to get PSS information
 --

 Key: YARN-1775
 URL: https://issues.apache.org/jira/browse/YARN-1775
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Attachments: YARN-1775-v2.patch, yarn-1775-2.4.0.patch


 Create SMAPBasedProcessTree (by extending ProcfsBasedProcessTree), which will 
 make use of PSS for computing the memory usage. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1775) Create SMAPBasedProcessTree to get PSS information

2014-03-20 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated YARN-1775:
---

Attachment: YARN-1775-v3.patch

Attaching correct patch.

 Create SMAPBasedProcessTree to get PSS information
 --

 Key: YARN-1775
 URL: https://issues.apache.org/jira/browse/YARN-1775
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Attachments: YARN-1775-v2.patch, YARN-1775-v3.patch, 
 yarn-1775-2.4.0.patch


 Create SMAPBasedProcessTree (by extending ProcfsBasedProcessTree), which will 
 make use of PSS for computing the memory usage. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1775) Create SMAPBasedProcessTree to get PSS information

2014-03-14 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated YARN-1775:
---

Attachment: yarn-1775-2.4.0.patch

Computes the RSS by reading /proc/pid/smaps.  Tested with branch 2.4.0 on 20 
node cluster.

 Create SMAPBasedProcessTree to get PSS information
 --

 Key: YARN-1775
 URL: https://issues.apache.org/jira/browse/YARN-1775
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Rajesh Balamohan
Priority: Minor
 Attachments: yarn-1775-2.4.0.patch


 Create SMAPBasedProcessTree (by extending ProcfsBasedProcessTree), which will 
 make use of PSS for computing the memory usage. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1775) Create SMAPBasedProcessTree to get PSS information

2014-03-14 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934829#comment-13934829
 ] 

Rajesh Balamohan commented on YARN-1775:


Review request link : https://reviews.apache.org/r/19220/

 Create SMAPBasedProcessTree to get PSS information
 --

 Key: YARN-1775
 URL: https://issues.apache.org/jira/browse/YARN-1775
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Rajesh Balamohan
Priority: Minor
 Attachments: yarn-1775-2.4.0.patch


 Create SMAPBasedProcessTree (by extending ProcfsBasedProcessTree), which will 
 make use of PSS for computing the memory usage. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1775) Create SMAPBasedProcessTree to get PSS information

2014-03-14 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated YARN-1775:
---

Fix Version/s: 2.5.0

 Create SMAPBasedProcessTree to get PSS information
 --

 Key: YARN-1775
 URL: https://issues.apache.org/jira/browse/YARN-1775
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Rajesh Balamohan
Priority: Minor
 Fix For: 2.5.0

 Attachments: yarn-1775-2.4.0.patch


 Create SMAPBasedProcessTree (by extending ProcfsBasedProcessTree), which will 
 make use of PSS for computing the memory usage. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-1775) Create SMAPBasedProcessTree to get PSS information

2014-03-02 Thread Rajesh Balamohan (JIRA)
Rajesh Balamohan created YARN-1775:
--

 Summary: Create SMAPBasedProcessTree to get PSS information
 Key: YARN-1775
 URL: https://issues.apache.org/jira/browse/YARN-1775
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Rajesh Balamohan
Priority: Minor


Create SMAPBasedProcessTree (by extending ProcfsBasedProcessTree), which will 
make use of PSS for computing the memory usage. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)