[jira] [Commented] (YARN-2821) Distributed shell app master becomes unresponsive sometimes
[ https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552209#comment-14552209 ] Hudson commented on YARN-2821: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #933 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/933/]) YARN-2821. Fixed a problem that DistributedShell AM may hang if restarted. Contributed by Varun Vasudev (jianhe: rev 7438966586f1896ab3e8b067d47a4af28a894106) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/pom.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDSAppMaster.java Distributed shell app master becomes unresponsive sometimes --- Key: YARN-2821 URL: https://issues.apache.org/jira/browse/YARN-2821 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Affects Versions: 2.5.1 Reporter: Varun Vasudev Assignee: Varun Vasudev Fix For: 2.8.0 Attachments: YARN-2821.002.patch, YARN-2821.003.patch, YARN-2821.004.patch, YARN-2821.005.patch, apache-yarn-2821.0.patch, apache-yarn-2821.1.patch We've noticed that once in a while the distributed shell app master becomes unresponsive and is eventually killed by the RM. snippet of the logs - {noformat} 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: appattempt_1415123350094_0017_01 received 0 previous attempts' running containers on AM registration. 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : onprem-tez2:45454 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, allocatedCnt=1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_02, containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up container launch container for containerid=container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : onprem-tez2:45454 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: QUERY_CONTAINER for Container container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : onprem-tez2:45454 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : onprem-tez3:45454 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : onprem-tez4:45454 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, allocatedCnt=3 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_03, containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_04, containerNode=onprem-tez3:45454, containerNodeURI=onprem-tez3:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_05, containerNode=onprem-tez4:45454, containerNodeURI=onprem-tez4:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:39 INFO
[jira] [Commented] (YARN-2821) Distributed shell app master becomes unresponsive sometimes
[ https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552186#comment-14552186 ] Hudson commented on YARN-2821: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #202 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/202/]) YARN-2821. Fixed a problem that DistributedShell AM may hang if restarted. Contributed by Varun Vasudev (jianhe: rev 7438966586f1896ab3e8b067d47a4af28a894106) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDSAppMaster.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/pom.xml * hadoop-yarn-project/CHANGES.txt Distributed shell app master becomes unresponsive sometimes --- Key: YARN-2821 URL: https://issues.apache.org/jira/browse/YARN-2821 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Affects Versions: 2.5.1 Reporter: Varun Vasudev Assignee: Varun Vasudev Fix For: 2.8.0 Attachments: YARN-2821.002.patch, YARN-2821.003.patch, YARN-2821.004.patch, YARN-2821.005.patch, apache-yarn-2821.0.patch, apache-yarn-2821.1.patch We've noticed that once in a while the distributed shell app master becomes unresponsive and is eventually killed by the RM. snippet of the logs - {noformat} 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: appattempt_1415123350094_0017_01 received 0 previous attempts' running containers on AM registration. 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : onprem-tez2:45454 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, allocatedCnt=1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_02, containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up container launch container for containerid=container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : onprem-tez2:45454 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: QUERY_CONTAINER for Container container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : onprem-tez2:45454 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : onprem-tez3:45454 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : onprem-tez4:45454 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, allocatedCnt=3 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_03, containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_04, containerNode=onprem-tez3:45454, containerNodeURI=onprem-tez3:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_05, containerNode=onprem-tez4:45454, containerNodeURI=onprem-tez4:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04
[jira] [Commented] (YARN-2821) Distributed shell app master becomes unresponsive sometimes
[ https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552597#comment-14552597 ] Hudson commented on YARN-2821: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2149 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2149/]) YARN-2821. Fixed a problem that DistributedShell AM may hang if restarted. Contributed by Varun Vasudev (jianhe: rev 7438966586f1896ab3e8b067d47a4af28a894106) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/pom.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDSAppMaster.java Distributed shell app master becomes unresponsive sometimes --- Key: YARN-2821 URL: https://issues.apache.org/jira/browse/YARN-2821 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Affects Versions: 2.5.1 Reporter: Varun Vasudev Assignee: Varun Vasudev Fix For: 2.8.0 Attachments: YARN-2821.002.patch, YARN-2821.003.patch, YARN-2821.004.patch, YARN-2821.005.patch, apache-yarn-2821.0.patch, apache-yarn-2821.1.patch We've noticed that once in a while the distributed shell app master becomes unresponsive and is eventually killed by the RM. snippet of the logs - {noformat} 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: appattempt_1415123350094_0017_01 received 0 previous attempts' running containers on AM registration. 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : onprem-tez2:45454 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, allocatedCnt=1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_02, containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up container launch container for containerid=container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : onprem-tez2:45454 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: QUERY_CONTAINER for Container container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : onprem-tez2:45454 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : onprem-tez3:45454 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : onprem-tez4:45454 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, allocatedCnt=3 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_03, containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_04, containerNode=onprem-tez3:45454, containerNodeURI=onprem-tez3:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_05, containerNode=onprem-tez4:45454, containerNodeURI=onprem-tez4:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04
[jira] [Commented] (YARN-2821) Distributed shell app master becomes unresponsive sometimes
[ https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552554#comment-14552554 ] Hudson commented on YARN-2821: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #201 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/201/]) YARN-2821. Fixed a problem that DistributedShell AM may hang if restarted. Contributed by Varun Vasudev (jianhe: rev 7438966586f1896ab3e8b067d47a4af28a894106) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/pom.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDSAppMaster.java * hadoop-yarn-project/CHANGES.txt Distributed shell app master becomes unresponsive sometimes --- Key: YARN-2821 URL: https://issues.apache.org/jira/browse/YARN-2821 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Affects Versions: 2.5.1 Reporter: Varun Vasudev Assignee: Varun Vasudev Fix For: 2.8.0 Attachments: YARN-2821.002.patch, YARN-2821.003.patch, YARN-2821.004.patch, YARN-2821.005.patch, apache-yarn-2821.0.patch, apache-yarn-2821.1.patch We've noticed that once in a while the distributed shell app master becomes unresponsive and is eventually killed by the RM. snippet of the logs - {noformat} 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: appattempt_1415123350094_0017_01 received 0 previous attempts' running containers on AM registration. 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : onprem-tez2:45454 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, allocatedCnt=1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_02, containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up container launch container for containerid=container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : onprem-tez2:45454 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: QUERY_CONTAINER for Container container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : onprem-tez2:45454 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : onprem-tez3:45454 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : onprem-tez4:45454 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, allocatedCnt=3 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_03, containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_04, containerNode=onprem-tez3:45454, containerNodeURI=onprem-tez3:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_05, containerNode=onprem-tez4:45454, containerNodeURI=onprem-tez4:50060, containerResourceMemory1024, containerResourceVirtualCores1
[jira] [Commented] (YARN-2821) Distributed shell app master becomes unresponsive sometimes
[ https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552392#comment-14552392 ] Hudson commented on YARN-2821: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2131 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2131/]) YARN-2821. Fixed a problem that DistributedShell AM may hang if restarted. Contributed by Varun Vasudev (jianhe: rev 7438966586f1896ab3e8b067d47a4af28a894106) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/pom.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDSAppMaster.java Distributed shell app master becomes unresponsive sometimes --- Key: YARN-2821 URL: https://issues.apache.org/jira/browse/YARN-2821 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Affects Versions: 2.5.1 Reporter: Varun Vasudev Assignee: Varun Vasudev Fix For: 2.8.0 Attachments: YARN-2821.002.patch, YARN-2821.003.patch, YARN-2821.004.patch, YARN-2821.005.patch, apache-yarn-2821.0.patch, apache-yarn-2821.1.patch We've noticed that once in a while the distributed shell app master becomes unresponsive and is eventually killed by the RM. snippet of the logs - {noformat} 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: appattempt_1415123350094_0017_01 received 0 previous attempts' running containers on AM registration. 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : onprem-tez2:45454 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, allocatedCnt=1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_02, containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up container launch container for containerid=container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : onprem-tez2:45454 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: QUERY_CONTAINER for Container container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : onprem-tez2:45454 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : onprem-tez3:45454 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : onprem-tez4:45454 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, allocatedCnt=3 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_03, containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_04, containerNode=onprem-tez3:45454, containerNodeURI=onprem-tez3:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_05, containerNode=onprem-tez4:45454, containerNodeURI=onprem-tez4:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:39
[jira] [Commented] (YARN-2821) Distributed shell app master becomes unresponsive sometimes
[ https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552442#comment-14552442 ] Hudson commented on YARN-2821: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #191 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/191/]) YARN-2821. Fixed a problem that DistributedShell AM may hang if restarted. Contributed by Varun Vasudev (jianhe: rev 7438966586f1896ab3e8b067d47a4af28a894106) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/pom.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDSAppMaster.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java Distributed shell app master becomes unresponsive sometimes --- Key: YARN-2821 URL: https://issues.apache.org/jira/browse/YARN-2821 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Affects Versions: 2.5.1 Reporter: Varun Vasudev Assignee: Varun Vasudev Fix For: 2.8.0 Attachments: YARN-2821.002.patch, YARN-2821.003.patch, YARN-2821.004.patch, YARN-2821.005.patch, apache-yarn-2821.0.patch, apache-yarn-2821.1.patch We've noticed that once in a while the distributed shell app master becomes unresponsive and is eventually killed by the RM. snippet of the logs - {noformat} 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: appattempt_1415123350094_0017_01 received 0 previous attempts' running containers on AM registration. 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : onprem-tez2:45454 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, allocatedCnt=1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_02, containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up container launch container for containerid=container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : onprem-tez2:45454 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: QUERY_CONTAINER for Container container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : onprem-tez2:45454 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : onprem-tez3:45454 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : onprem-tez4:45454 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, allocatedCnt=3 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_03, containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_04, containerNode=onprem-tez3:45454, containerNodeURI=onprem-tez3:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_05, containerNode=onprem-tez4:45454, containerNodeURI=onprem-tez4:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04
[jira] [Commented] (YARN-2821) Distributed shell app master becomes unresponsive sometimes
[ https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550216#comment-14550216 ] Hadoop QA commented on YARN-2821: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 40s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 36s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 37s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 18s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 35s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 6m 56s | Tests passed in hadoop-yarn-applications-distributedshell. | | | | 42m 15s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733765/YARN-2821.005.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / eb4c9dd | | hadoop-yarn-applications-distributedshell test log | https://builds.apache.org/job/PreCommit-YARN-Build/7995/artifact/patchprocess/testrun_hadoop-yarn-applications-distributedshell.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7995/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7995/console | This message was automatically generated. Distributed shell app master becomes unresponsive sometimes --- Key: YARN-2821 URL: https://issues.apache.org/jira/browse/YARN-2821 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Affects Versions: 2.5.1 Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: YARN-2821.002.patch, YARN-2821.003.patch, YARN-2821.004.patch, YARN-2821.005.patch, apache-yarn-2821.0.patch, apache-yarn-2821.1.patch We've noticed that once in a while the distributed shell app master becomes unresponsive and is eventually killed by the RM. snippet of the logs - {noformat} 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: appattempt_1415123350094_0017_01 received 0 previous attempts' running containers on AM registration. 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : onprem-tez2:45454 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, allocatedCnt=1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_02, containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up container launch container for containerid=container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : onprem-tez2:45454 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: QUERY_CONTAINER for Container
[jira] [Commented] (YARN-2821) Distributed shell app master becomes unresponsive sometimes
[ https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551273#comment-14551273 ] Hudson commented on YARN-2821: -- FAILURE: Integrated in Hadoop-trunk-Commit #7868 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7868/]) YARN-2821. Fixed a problem that DistributedShell AM may hang if restarted. Contributed by Varun Vasudev (jianhe: rev 7438966586f1896ab3e8b067d47a4af28a894106) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/pom.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDSAppMaster.java Distributed shell app master becomes unresponsive sometimes --- Key: YARN-2821 URL: https://issues.apache.org/jira/browse/YARN-2821 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Affects Versions: 2.5.1 Reporter: Varun Vasudev Assignee: Varun Vasudev Fix For: 2.8.0 Attachments: YARN-2821.002.patch, YARN-2821.003.patch, YARN-2821.004.patch, YARN-2821.005.patch, apache-yarn-2821.0.patch, apache-yarn-2821.1.patch We've noticed that once in a while the distributed shell app master becomes unresponsive and is eventually killed by the RM. snippet of the logs - {noformat} 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: appattempt_1415123350094_0017_01 received 0 previous attempts' running containers on AM registration. 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : onprem-tez2:45454 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, allocatedCnt=1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_02, containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up container launch container for containerid=container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : onprem-tez2:45454 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: QUERY_CONTAINER for Container container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : onprem-tez2:45454 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : onprem-tez3:45454 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : onprem-tez4:45454 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, allocatedCnt=3 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_03, containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_04, containerNode=onprem-tez3:45454, containerNodeURI=onprem-tez3:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_05, containerNode=onprem-tez4:45454, containerNodeURI=onprem-tez4:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:39
[jira] [Commented] (YARN-2821) Distributed shell app master becomes unresponsive sometimes
[ https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547966#comment-14547966 ] Hadoop QA commented on YARN-2821: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 46s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 33s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 37s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 24s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 32s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 36s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 6m 58s | Tests passed in hadoop-yarn-applications-distributedshell. | | | | 42m 25s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733521/YARN-2821.004.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 363c355 | | hadoop-yarn-applications-distributedshell test log | https://builds.apache.org/job/PreCommit-YARN-Build/7967/artifact/patchprocess/testrun_hadoop-yarn-applications-distributedshell.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7967/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7967/console | This message was automatically generated. Distributed shell app master becomes unresponsive sometimes --- Key: YARN-2821 URL: https://issues.apache.org/jira/browse/YARN-2821 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Affects Versions: 2.5.1 Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: YARN-2821.002.patch, YARN-2821.003.patch, YARN-2821.004.patch, apache-yarn-2821.0.patch, apache-yarn-2821.1.patch We've noticed that once in a while the distributed shell app master becomes unresponsive and is eventually killed by the RM. snippet of the logs - {noformat} 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: appattempt_1415123350094_0017_01 received 0 previous attempts' running containers on AM registration. 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : onprem-tez2:45454 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, allocatedCnt=1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_02, containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up container launch container for containerid=container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : onprem-tez2:45454 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: QUERY_CONTAINER for Container
[jira] [Commented] (YARN-2821) Distributed shell app master becomes unresponsive sometimes
[ https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549099#comment-14549099 ] Jian He commented on YARN-2821: --- thanks Varun ! looks good overall. test case - for below, we need to test that if AM receives any unknown completed container, the numCompletedContainers still equals to the numTotalContainers {code} // ignore containers we know nothing about - probably from a previous // attempt if (!launchedContainers.contains(containerStatus.getContainerId())) { LOG.info(Ignoring completed status of + containerStatus.getContainerId() + ; unknown container(probably launched by previous attempt)); continue; } {code} Distributed shell app master becomes unresponsive sometimes --- Key: YARN-2821 URL: https://issues.apache.org/jira/browse/YARN-2821 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Affects Versions: 2.5.1 Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: YARN-2821.002.patch, YARN-2821.003.patch, YARN-2821.004.patch, apache-yarn-2821.0.patch, apache-yarn-2821.1.patch We've noticed that once in a while the distributed shell app master becomes unresponsive and is eventually killed by the RM. snippet of the logs - {noformat} 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: appattempt_1415123350094_0017_01 received 0 previous attempts' running containers on AM registration. 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : onprem-tez2:45454 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, allocatedCnt=1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_02, containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up container launch container for containerid=container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : onprem-tez2:45454 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: QUERY_CONTAINER for Container container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : onprem-tez2:45454 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : onprem-tez3:45454 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : onprem-tez4:45454 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, allocatedCnt=3 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_03, containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_04, containerNode=onprem-tez3:45454, containerNodeURI=onprem-tez3:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_05, containerNode=onprem-tez4:45454, containerNodeURI=onprem-tez4:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Setting up container launch container for containerid=container_1415123350094_0017_01_03 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Setting up container launch container for
[jira] [Commented] (YARN-2821) Distributed shell app master becomes unresponsive sometimes
[ https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14543206#comment-14543206 ] Jian He commented on YARN-2821: --- The current patch makes sense because there's no way to figure out previously finished apps other than persisting. But I'm thinking if it is a bit over-kill to do this for an example app. One thing in my mind is that, inside the onContainersCompleted we can filter out the previously attempts' finished containers and only do it for current attempt's finished containers. And in the finish() method, we compare numFinishedContainersOfcurrentAttempt==numContainersAskedByCurrentAttempt; will this work ? I know doing this, the total finished containers may be larger than user specified, but given this is just an example app, maybe tolerable ? On the other hand, the current patch may not work on secure cluster because it communicates with hdfs. {{renameScriptFile}} method is an example to talk with hdfs. Distributed shell app master becomes unresponsive sometimes --- Key: YARN-2821 URL: https://issues.apache.org/jira/browse/YARN-2821 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Affects Versions: 2.5.1 Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: YARN-2821.002.patch, YARN-2821.003.patch, apache-yarn-2821.0.patch, apache-yarn-2821.1.patch We've noticed that once in a while the distributed shell app master becomes unresponsive and is eventually killed by the RM. snippet of the logs - {noformat} 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: appattempt_1415123350094_0017_01 received 0 previous attempts' running containers on AM registration. 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : onprem-tez2:45454 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, allocatedCnt=1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_02, containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up container launch container for containerid=container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : onprem-tez2:45454 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: QUERY_CONTAINER for Container container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : onprem-tez2:45454 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : onprem-tez3:45454 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : onprem-tez4:45454 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, allocatedCnt=3 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_03, containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_04, containerNode=onprem-tez3:45454, containerNodeURI=onprem-tez3:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_05, containerNode=onprem-tez4:45454, containerNodeURI=onprem-tez4:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Setting up
[jira] [Commented] (YARN-2821) Distributed shell app master becomes unresponsive sometimes
[ https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534637#comment-14534637 ] Varun Vasudev commented on YARN-2821: - [~jianhe] - can you please review? Thanks! Distributed shell app master becomes unresponsive sometimes --- Key: YARN-2821 URL: https://issues.apache.org/jira/browse/YARN-2821 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Affects Versions: 2.5.1 Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: YARN-2821.002.patch, YARN-2821.003.patch, apache-yarn-2821.0.patch, apache-yarn-2821.1.patch We've noticed that once in a while the distributed shell app master becomes unresponsive and is eventually killed by the RM. snippet of the logs - {noformat} 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: appattempt_1415123350094_0017_01 received 0 previous attempts' running containers on AM registration. 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : onprem-tez2:45454 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, allocatedCnt=1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_02, containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up container launch container for containerid=container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : onprem-tez2:45454 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: QUERY_CONTAINER for Container container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : onprem-tez2:45454 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : onprem-tez3:45454 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : onprem-tez4:45454 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, allocatedCnt=3 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_03, containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_04, containerNode=onprem-tez3:45454, containerNodeURI=onprem-tez3:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_05, containerNode=onprem-tez4:45454, containerNodeURI=onprem-tez4:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Setting up container launch container for containerid=container_1415123350094_0017_01_03 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Setting up container launch container for containerid=container_1415123350094_0017_01_05 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Setting up container launch container for containerid=container_1415123350094_0017_01_04 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1415123350094_0017_01_05 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1415123350094_0017_01_03 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening
[jira] [Commented] (YARN-2821) Distributed shell app master becomes unresponsive sometimes
[ https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534626#comment-14534626 ] Hadoop QA commented on YARN-2821: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 4s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 46s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 54s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 23s | The applied patch generated 2 new checkstyle issues (total was 150, now 150). | | {color:green}+1{color} | whitespace | 0m 3s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 37s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 37s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 7m 2s | Tests passed in hadoop-yarn-applications-distributedshell. | | | | 43m 25s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12731438/YARN-2821.003.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 241a72a | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7806/artifact/patchprocess/diffcheckstylehadoop-yarn-applications-distributedshell.txt | | hadoop-yarn-applications-distributedshell test log | https://builds.apache.org/job/PreCommit-YARN-Build/7806/artifact/patchprocess/testrun_hadoop-yarn-applications-distributedshell.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7806/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7806/console | This message was automatically generated. Distributed shell app master becomes unresponsive sometimes --- Key: YARN-2821 URL: https://issues.apache.org/jira/browse/YARN-2821 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Affects Versions: 2.5.1 Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: YARN-2821.002.patch, YARN-2821.003.patch, apache-yarn-2821.0.patch, apache-yarn-2821.1.patch We've noticed that once in a while the distributed shell app master becomes unresponsive and is eventually killed by the RM. snippet of the logs - {noformat} 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: appattempt_1415123350094_0017_01 received 0 previous attempts' running containers on AM registration. 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : onprem-tez2:45454 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, allocatedCnt=1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_02, containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up container launch container for containerid=container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy:
[jira] [Commented] (YARN-2821) Distributed shell app master becomes unresponsive sometimes
[ https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533233#comment-14533233 ] Hadoop QA commented on YARN-2821: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 49s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 35s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 37s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 22s | The applied patch generated 6 new checkstyle issues (total was 151, now 155). | | {color:green}+1{color} | whitespace | 0m 2s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 35s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:red}-1{color} | yarn tests | 15m 25s | Tests failed in hadoop-yarn-applications-distributedshell. | | | | 50m 56s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.applications.distributedshell.TestDistributedShellWithNodeLabels | | Timed out tests | org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12731235/YARN-2821.002.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / daf3e4e | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7770/artifact/patchprocess/diffcheckstylehadoop-yarn-applications-distributedshell.txt | | hadoop-yarn-applications-distributedshell test log | https://builds.apache.org/job/PreCommit-YARN-Build/7770/artifact/patchprocess/testrun_hadoop-yarn-applications-distributedshell.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7770/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7770/console | This message was automatically generated. Distributed shell app master becomes unresponsive sometimes --- Key: YARN-2821 URL: https://issues.apache.org/jira/browse/YARN-2821 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Affects Versions: 2.5.1 Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: YARN-2821.002.patch, apache-yarn-2821.0.patch, apache-yarn-2821.1.patch We've noticed that once in a while the distributed shell app master becomes unresponsive and is eventually killed by the RM. snippet of the logs - {noformat} 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: appattempt_1415123350094_0017_01 received 0 previous attempts' running containers on AM registration. 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : onprem-tez2:45454 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, allocatedCnt=1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_02, containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up container launch container for containerid=container_1415123350094_0017_01_02 14/11/04
[jira] [Commented] (YARN-2821) Distributed shell app master becomes unresponsive sometimes
[ https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525991#comment-14525991 ] Hadoop QA commented on YARN-2821: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 37s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 1s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 35s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 32s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 20s | The applied patch generated 3 new checkstyle issues (total was 47, now 50). | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 32s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 36s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 7m 1s | Tests passed in hadoop-yarn-applications-distributedshell. | | | | 42m 14s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12680098/apache-yarn-2821.1.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / a319771 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7675/artifact/patchprocess/diffcheckstylehadoop-yarn-applications-distributedshell.txt | | hadoop-yarn-applications-distributedshell test log | https://builds.apache.org/job/PreCommit-YARN-Build/7675/artifact/patchprocess/testrun_hadoop-yarn-applications-distributedshell.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7675/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7675/console | This message was automatically generated. Distributed shell app master becomes unresponsive sometimes --- Key: YARN-2821 URL: https://issues.apache.org/jira/browse/YARN-2821 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Affects Versions: 2.5.1 Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2821.0.patch, apache-yarn-2821.1.patch We've noticed that once in a while the distributed shell app master becomes unresponsive and is eventually killed by the RM. snippet of the logs - {noformat} 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: appattempt_1415123350094_0017_01 received 0 previous attempts' running containers on AM registration. 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : onprem-tez2:45454 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, allocatedCnt=1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_02, containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up container launch container for containerid=container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : onprem-tez2:45454
[jira] [Commented] (YARN-2821) Distributed shell app master becomes unresponsive sometimes
[ https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525660#comment-14525660 ] Hadoop QA commented on YARN-2821: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 44s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 33s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 33s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 22s | The applied patch generated 3 new checkstyle issues (total was 46, now 49). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 36s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 6m 58s | Tests passed in hadoop-yarn-applications-distributedshell. | | | | 42m 18s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12680098/apache-yarn-2821.1.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / e8d0ee5 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7669/artifact/patchprocess/diffcheckstylehadoop-yarn-applications-distributedshell.txt | | hadoop-yarn-applications-distributedshell test log | https://builds.apache.org/job/PreCommit-YARN-Build/7669/artifact/patchprocess/testrun_hadoop-yarn-applications-distributedshell.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7669/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7669/console | This message was automatically generated. Distributed shell app master becomes unresponsive sometimes --- Key: YARN-2821 URL: https://issues.apache.org/jira/browse/YARN-2821 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Affects Versions: 2.5.1 Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2821.0.patch, apache-yarn-2821.1.patch We've noticed that once in a while the distributed shell app master becomes unresponsive and is eventually killed by the RM. snippet of the logs - {noformat} 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: appattempt_1415123350094_0017_01 received 0 previous attempts' running containers on AM registration. 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : onprem-tez2:45454 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, allocatedCnt=1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_02, containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up container launch container for containerid=container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : onprem-tez2:45454
[jira] [Commented] (YARN-2821) Distributed shell app master becomes unresponsive sometimes
[ https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202949#comment-14202949 ] Jian He commented on YARN-2821: --- - we can use {{Collections.synchronizedSet(new HashSet());}} instead of concurrentHashMap - entry in releasedContainers once added , never removed. - found that actually in this case, the container will expire and the exit status is ABORTED, numCompletedContainers will not be incremented in onContainersCompleted {code} // no need to update numCompletedContainers because we'll get // a notification from the RM anyway and we'll handle it there {code} Distributed shell app master becomes unresponsive sometimes --- Key: YARN-2821 URL: https://issues.apache.org/jira/browse/YARN-2821 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Affects Versions: 2.5.1 Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2821.0.patch, apache-yarn-2821.1.patch We've noticed that once in a while the distributed shell app master becomes unresponsive and is eventually killed by the RM. snippet of the logs - {noformat} 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: appattempt_1415123350094_0017_01 received 0 previous attempts' running containers on AM registration. 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : onprem-tez2:45454 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, allocatedCnt=1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_02, containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up container launch container for containerid=container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : onprem-tez2:45454 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: QUERY_CONTAINER for Container container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : onprem-tez2:45454 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : onprem-tez3:45454 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : onprem-tez4:45454 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, allocatedCnt=3 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_03, containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_04, containerNode=onprem-tez3:45454, containerNodeURI=onprem-tez3:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_05, containerNode=onprem-tez4:45454, containerNodeURI=onprem-tez4:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Setting up container launch container for containerid=container_1415123350094_0017_01_03 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Setting up container launch container for containerid=container_1415123350094_0017_01_05 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Setting up container launch container for
[jira] [Commented] (YARN-2821) Distributed shell app master becomes unresponsive sometimes
[ https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200997#comment-14200997 ] Hadoop QA commented on YARN-2821: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12679948/apache-yarn-2821.0.patch against trunk revision 1670578. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5757//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5757//console This message is automatically generated. Distributed shell app master becomes unresponsive sometimes --- Key: YARN-2821 URL: https://issues.apache.org/jira/browse/YARN-2821 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Affects Versions: 2.5.1 Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2821.0.patch We've noticed that once in a while the distributed shell app master becomes unresponsive and is eventually killed by the RM. snippet of the logs - {noformat} 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: appattempt_1415123350094_0017_01 received 0 previous attempts' running containers on AM registration. 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : onprem-tez2:45454 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, allocatedCnt=1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_02, containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up container launch container for containerid=container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : onprem-tez2:45454 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: QUERY_CONTAINER for Container container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : onprem-tez2:45454 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : onprem-tez3:45454 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : onprem-tez4:45454 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, allocatedCnt=3 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_03, containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_04,
[jira] [Commented] (YARN-2821) Distributed shell app master becomes unresponsive sometimes
[ https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14201183#comment-14201183 ] Jian He commented on YARN-2821: --- thanks Varun ! the patch should solve the problem to release extra allocated containers. but seems in some cases, we may still run into the infinite loop. In finish(), it's checking {{numCompletedContainers.get() != numTotalContainers}}, but numCompletedContainers could be incremented elsewhere. e.g. {{onStartContainerError}} will also increase the numCompletedContainers count. Could the numCompletedContainers go beyond the numTotalContainers in such scenario also ? Distributed shell app master becomes unresponsive sometimes --- Key: YARN-2821 URL: https://issues.apache.org/jira/browse/YARN-2821 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Affects Versions: 2.5.1 Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2821.0.patch We've noticed that once in a while the distributed shell app master becomes unresponsive and is eventually killed by the RM. snippet of the logs - {noformat} 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: appattempt_1415123350094_0017_01 received 0 previous attempts' running containers on AM registration. 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : onprem-tez2:45454 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, allocatedCnt=1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_02, containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up container launch container for containerid=container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : onprem-tez2:45454 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: QUERY_CONTAINER for Container container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : onprem-tez2:45454 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : onprem-tez3:45454 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : onprem-tez4:45454 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, allocatedCnt=3 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_03, containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_04, containerNode=onprem-tez3:45454, containerNodeURI=onprem-tez3:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_05, containerNode=onprem-tez4:45454, containerNodeURI=onprem-tez4:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Setting up container launch container for containerid=container_1415123350094_0017_01_03 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Setting up container launch container for containerid=container_1415123350094_0017_01_05 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Setting up container launch container for containerid=container_1415123350094_0017_01_04 14/11/04
[jira] [Commented] (YARN-2821) Distributed shell app master becomes unresponsive sometimes
[ https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14201681#comment-14201681 ] Hadoop QA commented on YARN-2821: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12680098/apache-yarn-2821.1.patch against trunk revision 61effcb. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell: org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5777//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5777//console This message is automatically generated. Distributed shell app master becomes unresponsive sometimes --- Key: YARN-2821 URL: https://issues.apache.org/jira/browse/YARN-2821 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Affects Versions: 2.5.1 Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2821.0.patch, apache-yarn-2821.1.patch We've noticed that once in a while the distributed shell app master becomes unresponsive and is eventually killed by the RM. snippet of the logs - {noformat} 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: appattempt_1415123350094_0017_01 received 0 previous attempts' running containers on AM registration. 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : onprem-tez2:45454 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, allocatedCnt=1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_02, containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up container launch container for containerid=container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : onprem-tez2:45454 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: QUERY_CONTAINER for Container container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : onprem-tez2:45454 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : onprem-tez3:45454 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : onprem-tez4:45454 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, allocatedCnt=3 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_03, containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:39 INFO