[jira] [Commented] (YARN-2821) Distributed shell app master becomes unresponsive sometimes

2015-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552209#comment-14552209
 ] 

Hudson commented on YARN-2821:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #933 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/933/])
YARN-2821. Fixed a problem that DistributedShell AM may hang if restarted. 
Contributed by Varun Vasudev (jianhe: rev 
7438966586f1896ab3e8b067d47a4af28a894106)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/pom.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDSAppMaster.java


 Distributed shell app master becomes unresponsive sometimes
 ---

 Key: YARN-2821
 URL: https://issues.apache.org/jira/browse/YARN-2821
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Affects Versions: 2.5.1
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.8.0

 Attachments: YARN-2821.002.patch, YARN-2821.003.patch, 
 YARN-2821.004.patch, YARN-2821.005.patch, apache-yarn-2821.0.patch, 
 apache-yarn-2821.1.patch


 We've noticed that once in a while the distributed shell app master becomes 
 unresponsive and is eventually killed by the RM. snippet of the logs -
 {noformat}
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: 
 appattempt_1415123350094_0017_01 received 0 previous attempts' running 
 containers on AM registration.
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : 
 onprem-tez2:45454
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from 
 RM for container ask, allocatedCnt=1
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell 
 command on a new container., 
 containerId=container_1415123350094_0017_01_02, 
 containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, 
 containerResourceMemory1024, containerResourceVirtualCores1
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up 
 container launch container for 
 containerid=container_1415123350094_0017_01_02
 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
 START_CONTAINER for Container container_1415123350094_0017_01_02
 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
 onprem-tez2:45454
 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
 QUERY_CONTAINER for Container container_1415123350094_0017_01_02
 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
 onprem-tez2:45454
 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : 
 onprem-tez3:45454
 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : 
 onprem-tez4:45454
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Got response from 
 RM for container ask, allocatedCnt=3
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell 
 command on a new container., 
 containerId=container_1415123350094_0017_01_03, 
 containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, 
 containerResourceMemory1024, containerResourceVirtualCores1
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell 
 command on a new container., 
 containerId=container_1415123350094_0017_01_04, 
 containerNode=onprem-tez3:45454, containerNodeURI=onprem-tez3:50060, 
 containerResourceMemory1024, containerResourceVirtualCores1
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell 
 command on a new container., 
 containerId=container_1415123350094_0017_01_05, 
 containerNode=onprem-tez4:45454, containerNodeURI=onprem-tez4:50060, 
 containerResourceMemory1024, containerResourceVirtualCores1
 14/11/04 18:21:39 INFO 

[jira] [Commented] (YARN-2821) Distributed shell app master becomes unresponsive sometimes

2015-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552186#comment-14552186
 ] 

Hudson commented on YARN-2821:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #202 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/202/])
YARN-2821. Fixed a problem that DistributedShell AM may hang if restarted. 
Contributed by Varun Vasudev (jianhe: rev 
7438966586f1896ab3e8b067d47a4af28a894106)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDSAppMaster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/pom.xml
* hadoop-yarn-project/CHANGES.txt


 Distributed shell app master becomes unresponsive sometimes
 ---

 Key: YARN-2821
 URL: https://issues.apache.org/jira/browse/YARN-2821
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Affects Versions: 2.5.1
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.8.0

 Attachments: YARN-2821.002.patch, YARN-2821.003.patch, 
 YARN-2821.004.patch, YARN-2821.005.patch, apache-yarn-2821.0.patch, 
 apache-yarn-2821.1.patch


 We've noticed that once in a while the distributed shell app master becomes 
 unresponsive and is eventually killed by the RM. snippet of the logs -
 {noformat}
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: 
 appattempt_1415123350094_0017_01 received 0 previous attempts' running 
 containers on AM registration.
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : 
 onprem-tez2:45454
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from 
 RM for container ask, allocatedCnt=1
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell 
 command on a new container., 
 containerId=container_1415123350094_0017_01_02, 
 containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, 
 containerResourceMemory1024, containerResourceVirtualCores1
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up 
 container launch container for 
 containerid=container_1415123350094_0017_01_02
 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
 START_CONTAINER for Container container_1415123350094_0017_01_02
 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
 onprem-tez2:45454
 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
 QUERY_CONTAINER for Container container_1415123350094_0017_01_02
 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
 onprem-tez2:45454
 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : 
 onprem-tez3:45454
 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : 
 onprem-tez4:45454
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Got response from 
 RM for container ask, allocatedCnt=3
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell 
 command on a new container., 
 containerId=container_1415123350094_0017_01_03, 
 containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, 
 containerResourceMemory1024, containerResourceVirtualCores1
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell 
 command on a new container., 
 containerId=container_1415123350094_0017_01_04, 
 containerNode=onprem-tez3:45454, containerNodeURI=onprem-tez3:50060, 
 containerResourceMemory1024, containerResourceVirtualCores1
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell 
 command on a new container., 
 containerId=container_1415123350094_0017_01_05, 
 containerNode=onprem-tez4:45454, containerNodeURI=onprem-tez4:50060, 
 containerResourceMemory1024, containerResourceVirtualCores1
 14/11/04 

[jira] [Commented] (YARN-2821) Distributed shell app master becomes unresponsive sometimes

2015-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552597#comment-14552597
 ] 

Hudson commented on YARN-2821:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2149 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2149/])
YARN-2821. Fixed a problem that DistributedShell AM may hang if restarted. 
Contributed by Varun Vasudev (jianhe: rev 
7438966586f1896ab3e8b067d47a4af28a894106)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/pom.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDSAppMaster.java


 Distributed shell app master becomes unresponsive sometimes
 ---

 Key: YARN-2821
 URL: https://issues.apache.org/jira/browse/YARN-2821
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Affects Versions: 2.5.1
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.8.0

 Attachments: YARN-2821.002.patch, YARN-2821.003.patch, 
 YARN-2821.004.patch, YARN-2821.005.patch, apache-yarn-2821.0.patch, 
 apache-yarn-2821.1.patch


 We've noticed that once in a while the distributed shell app master becomes 
 unresponsive and is eventually killed by the RM. snippet of the logs -
 {noformat}
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: 
 appattempt_1415123350094_0017_01 received 0 previous attempts' running 
 containers on AM registration.
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : 
 onprem-tez2:45454
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from 
 RM for container ask, allocatedCnt=1
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell 
 command on a new container., 
 containerId=container_1415123350094_0017_01_02, 
 containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, 
 containerResourceMemory1024, containerResourceVirtualCores1
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up 
 container launch container for 
 containerid=container_1415123350094_0017_01_02
 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
 START_CONTAINER for Container container_1415123350094_0017_01_02
 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
 onprem-tez2:45454
 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
 QUERY_CONTAINER for Container container_1415123350094_0017_01_02
 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
 onprem-tez2:45454
 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : 
 onprem-tez3:45454
 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : 
 onprem-tez4:45454
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Got response from 
 RM for container ask, allocatedCnt=3
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell 
 command on a new container., 
 containerId=container_1415123350094_0017_01_03, 
 containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, 
 containerResourceMemory1024, containerResourceVirtualCores1
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell 
 command on a new container., 
 containerId=container_1415123350094_0017_01_04, 
 containerNode=onprem-tez3:45454, containerNodeURI=onprem-tez3:50060, 
 containerResourceMemory1024, containerResourceVirtualCores1
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell 
 command on a new container., 
 containerId=container_1415123350094_0017_01_05, 
 containerNode=onprem-tez4:45454, containerNodeURI=onprem-tez4:50060, 
 containerResourceMemory1024, containerResourceVirtualCores1
 14/11/04 

[jira] [Commented] (YARN-2821) Distributed shell app master becomes unresponsive sometimes

2015-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552554#comment-14552554
 ] 

Hudson commented on YARN-2821:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #201 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/201/])
YARN-2821. Fixed a problem that DistributedShell AM may hang if restarted. 
Contributed by Varun Vasudev (jianhe: rev 
7438966586f1896ab3e8b067d47a4af28a894106)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/pom.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDSAppMaster.java
* hadoop-yarn-project/CHANGES.txt


 Distributed shell app master becomes unresponsive sometimes
 ---

 Key: YARN-2821
 URL: https://issues.apache.org/jira/browse/YARN-2821
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Affects Versions: 2.5.1
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.8.0

 Attachments: YARN-2821.002.patch, YARN-2821.003.patch, 
 YARN-2821.004.patch, YARN-2821.005.patch, apache-yarn-2821.0.patch, 
 apache-yarn-2821.1.patch


 We've noticed that once in a while the distributed shell app master becomes 
 unresponsive and is eventually killed by the RM. snippet of the logs -
 {noformat}
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: 
 appattempt_1415123350094_0017_01 received 0 previous attempts' running 
 containers on AM registration.
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : 
 onprem-tez2:45454
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from 
 RM for container ask, allocatedCnt=1
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell 
 command on a new container., 
 containerId=container_1415123350094_0017_01_02, 
 containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, 
 containerResourceMemory1024, containerResourceVirtualCores1
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up 
 container launch container for 
 containerid=container_1415123350094_0017_01_02
 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
 START_CONTAINER for Container container_1415123350094_0017_01_02
 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
 onprem-tez2:45454
 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
 QUERY_CONTAINER for Container container_1415123350094_0017_01_02
 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
 onprem-tez2:45454
 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : 
 onprem-tez3:45454
 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : 
 onprem-tez4:45454
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Got response from 
 RM for container ask, allocatedCnt=3
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell 
 command on a new container., 
 containerId=container_1415123350094_0017_01_03, 
 containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, 
 containerResourceMemory1024, containerResourceVirtualCores1
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell 
 command on a new container., 
 containerId=container_1415123350094_0017_01_04, 
 containerNode=onprem-tez3:45454, containerNodeURI=onprem-tez3:50060, 
 containerResourceMemory1024, containerResourceVirtualCores1
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell 
 command on a new container., 
 containerId=container_1415123350094_0017_01_05, 
 containerNode=onprem-tez4:45454, containerNodeURI=onprem-tez4:50060, 
 containerResourceMemory1024, containerResourceVirtualCores1
 

[jira] [Commented] (YARN-2821) Distributed shell app master becomes unresponsive sometimes

2015-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552392#comment-14552392
 ] 

Hudson commented on YARN-2821:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2131 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2131/])
YARN-2821. Fixed a problem that DistributedShell AM may hang if restarted. 
Contributed by Varun Vasudev (jianhe: rev 
7438966586f1896ab3e8b067d47a4af28a894106)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/pom.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDSAppMaster.java


 Distributed shell app master becomes unresponsive sometimes
 ---

 Key: YARN-2821
 URL: https://issues.apache.org/jira/browse/YARN-2821
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Affects Versions: 2.5.1
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.8.0

 Attachments: YARN-2821.002.patch, YARN-2821.003.patch, 
 YARN-2821.004.patch, YARN-2821.005.patch, apache-yarn-2821.0.patch, 
 apache-yarn-2821.1.patch


 We've noticed that once in a while the distributed shell app master becomes 
 unresponsive and is eventually killed by the RM. snippet of the logs -
 {noformat}
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: 
 appattempt_1415123350094_0017_01 received 0 previous attempts' running 
 containers on AM registration.
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : 
 onprem-tez2:45454
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from 
 RM for container ask, allocatedCnt=1
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell 
 command on a new container., 
 containerId=container_1415123350094_0017_01_02, 
 containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, 
 containerResourceMemory1024, containerResourceVirtualCores1
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up 
 container launch container for 
 containerid=container_1415123350094_0017_01_02
 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
 START_CONTAINER for Container container_1415123350094_0017_01_02
 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
 onprem-tez2:45454
 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
 QUERY_CONTAINER for Container container_1415123350094_0017_01_02
 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
 onprem-tez2:45454
 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : 
 onprem-tez3:45454
 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : 
 onprem-tez4:45454
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Got response from 
 RM for container ask, allocatedCnt=3
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell 
 command on a new container., 
 containerId=container_1415123350094_0017_01_03, 
 containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, 
 containerResourceMemory1024, containerResourceVirtualCores1
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell 
 command on a new container., 
 containerId=container_1415123350094_0017_01_04, 
 containerNode=onprem-tez3:45454, containerNodeURI=onprem-tez3:50060, 
 containerResourceMemory1024, containerResourceVirtualCores1
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell 
 command on a new container., 
 containerId=container_1415123350094_0017_01_05, 
 containerNode=onprem-tez4:45454, containerNodeURI=onprem-tez4:50060, 
 containerResourceMemory1024, containerResourceVirtualCores1
 14/11/04 18:21:39 

[jira] [Commented] (YARN-2821) Distributed shell app master becomes unresponsive sometimes

2015-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552442#comment-14552442
 ] 

Hudson commented on YARN-2821:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #191 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/191/])
YARN-2821. Fixed a problem that DistributedShell AM may hang if restarted. 
Contributed by Varun Vasudev (jianhe: rev 
7438966586f1896ab3e8b067d47a4af28a894106)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/pom.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDSAppMaster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java


 Distributed shell app master becomes unresponsive sometimes
 ---

 Key: YARN-2821
 URL: https://issues.apache.org/jira/browse/YARN-2821
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Affects Versions: 2.5.1
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.8.0

 Attachments: YARN-2821.002.patch, YARN-2821.003.patch, 
 YARN-2821.004.patch, YARN-2821.005.patch, apache-yarn-2821.0.patch, 
 apache-yarn-2821.1.patch


 We've noticed that once in a while the distributed shell app master becomes 
 unresponsive and is eventually killed by the RM. snippet of the logs -
 {noformat}
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: 
 appattempt_1415123350094_0017_01 received 0 previous attempts' running 
 containers on AM registration.
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : 
 onprem-tez2:45454
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from 
 RM for container ask, allocatedCnt=1
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell 
 command on a new container., 
 containerId=container_1415123350094_0017_01_02, 
 containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, 
 containerResourceMemory1024, containerResourceVirtualCores1
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up 
 container launch container for 
 containerid=container_1415123350094_0017_01_02
 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
 START_CONTAINER for Container container_1415123350094_0017_01_02
 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
 onprem-tez2:45454
 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
 QUERY_CONTAINER for Container container_1415123350094_0017_01_02
 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
 onprem-tez2:45454
 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : 
 onprem-tez3:45454
 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : 
 onprem-tez4:45454
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Got response from 
 RM for container ask, allocatedCnt=3
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell 
 command on a new container., 
 containerId=container_1415123350094_0017_01_03, 
 containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, 
 containerResourceMemory1024, containerResourceVirtualCores1
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell 
 command on a new container., 
 containerId=container_1415123350094_0017_01_04, 
 containerNode=onprem-tez3:45454, containerNodeURI=onprem-tez3:50060, 
 containerResourceMemory1024, containerResourceVirtualCores1
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell 
 command on a new container., 
 containerId=container_1415123350094_0017_01_05, 
 containerNode=onprem-tez4:45454, containerNodeURI=onprem-tez4:50060, 
 containerResourceMemory1024, containerResourceVirtualCores1
 14/11/04 

[jira] [Commented] (YARN-2821) Distributed shell app master becomes unresponsive sometimes

2015-05-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550216#comment-14550216
 ] 

Hadoop QA commented on YARN-2821:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 40s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 36s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 37s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 18s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 35s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   6m 56s | Tests passed in 
hadoop-yarn-applications-distributedshell. |
| | |  42m 15s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12733765/YARN-2821.005.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / eb4c9dd |
| hadoop-yarn-applications-distributedshell test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7995/artifact/patchprocess/testrun_hadoop-yarn-applications-distributedshell.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7995/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7995/console |


This message was automatically generated.

 Distributed shell app master becomes unresponsive sometimes
 ---

 Key: YARN-2821
 URL: https://issues.apache.org/jira/browse/YARN-2821
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Affects Versions: 2.5.1
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: YARN-2821.002.patch, YARN-2821.003.patch, 
 YARN-2821.004.patch, YARN-2821.005.patch, apache-yarn-2821.0.patch, 
 apache-yarn-2821.1.patch


 We've noticed that once in a while the distributed shell app master becomes 
 unresponsive and is eventually killed by the RM. snippet of the logs -
 {noformat}
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: 
 appattempt_1415123350094_0017_01 received 0 previous attempts' running 
 containers on AM registration.
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : 
 onprem-tez2:45454
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from 
 RM for container ask, allocatedCnt=1
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell 
 command on a new container., 
 containerId=container_1415123350094_0017_01_02, 
 containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, 
 containerResourceMemory1024, containerResourceVirtualCores1
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up 
 container launch container for 
 containerid=container_1415123350094_0017_01_02
 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
 START_CONTAINER for Container container_1415123350094_0017_01_02
 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
 onprem-tez2:45454
 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
 QUERY_CONTAINER for Container 

[jira] [Commented] (YARN-2821) Distributed shell app master becomes unresponsive sometimes

2015-05-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551273#comment-14551273
 ] 

Hudson commented on YARN-2821:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7868 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7868/])
YARN-2821. Fixed a problem that DistributedShell AM may hang if restarted. 
Contributed by Varun Vasudev (jianhe: rev 
7438966586f1896ab3e8b067d47a4af28a894106)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/pom.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDSAppMaster.java


 Distributed shell app master becomes unresponsive sometimes
 ---

 Key: YARN-2821
 URL: https://issues.apache.org/jira/browse/YARN-2821
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Affects Versions: 2.5.1
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.8.0

 Attachments: YARN-2821.002.patch, YARN-2821.003.patch, 
 YARN-2821.004.patch, YARN-2821.005.patch, apache-yarn-2821.0.patch, 
 apache-yarn-2821.1.patch


 We've noticed that once in a while the distributed shell app master becomes 
 unresponsive and is eventually killed by the RM. snippet of the logs -
 {noformat}
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: 
 appattempt_1415123350094_0017_01 received 0 previous attempts' running 
 containers on AM registration.
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : 
 onprem-tez2:45454
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from 
 RM for container ask, allocatedCnt=1
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell 
 command on a new container., 
 containerId=container_1415123350094_0017_01_02, 
 containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, 
 containerResourceMemory1024, containerResourceVirtualCores1
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up 
 container launch container for 
 containerid=container_1415123350094_0017_01_02
 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
 START_CONTAINER for Container container_1415123350094_0017_01_02
 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
 onprem-tez2:45454
 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
 QUERY_CONTAINER for Container container_1415123350094_0017_01_02
 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
 onprem-tez2:45454
 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : 
 onprem-tez3:45454
 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : 
 onprem-tez4:45454
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Got response from 
 RM for container ask, allocatedCnt=3
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell 
 command on a new container., 
 containerId=container_1415123350094_0017_01_03, 
 containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, 
 containerResourceMemory1024, containerResourceVirtualCores1
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell 
 command on a new container., 
 containerId=container_1415123350094_0017_01_04, 
 containerNode=onprem-tez3:45454, containerNodeURI=onprem-tez3:50060, 
 containerResourceMemory1024, containerResourceVirtualCores1
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell 
 command on a new container., 
 containerId=container_1415123350094_0017_01_05, 
 containerNode=onprem-tez4:45454, containerNodeURI=onprem-tez4:50060, 
 containerResourceMemory1024, containerResourceVirtualCores1
 14/11/04 18:21:39 

[jira] [Commented] (YARN-2821) Distributed shell app master becomes unresponsive sometimes

2015-05-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547966#comment-14547966
 ] 

Hadoop QA commented on YARN-2821:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 46s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 33s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 37s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 24s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 32s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 36s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   6m 58s | Tests passed in 
hadoop-yarn-applications-distributedshell. |
| | |  42m 25s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12733521/YARN-2821.004.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 363c355 |
| hadoop-yarn-applications-distributedshell test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7967/artifact/patchprocess/testrun_hadoop-yarn-applications-distributedshell.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7967/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7967/console |


This message was automatically generated.

 Distributed shell app master becomes unresponsive sometimes
 ---

 Key: YARN-2821
 URL: https://issues.apache.org/jira/browse/YARN-2821
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Affects Versions: 2.5.1
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: YARN-2821.002.patch, YARN-2821.003.patch, 
 YARN-2821.004.patch, apache-yarn-2821.0.patch, apache-yarn-2821.1.patch


 We've noticed that once in a while the distributed shell app master becomes 
 unresponsive and is eventually killed by the RM. snippet of the logs -
 {noformat}
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: 
 appattempt_1415123350094_0017_01 received 0 previous attempts' running 
 containers on AM registration.
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : 
 onprem-tez2:45454
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from 
 RM for container ask, allocatedCnt=1
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell 
 command on a new container., 
 containerId=container_1415123350094_0017_01_02, 
 containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, 
 containerResourceMemory1024, containerResourceVirtualCores1
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up 
 container launch container for 
 containerid=container_1415123350094_0017_01_02
 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
 START_CONTAINER for Container container_1415123350094_0017_01_02
 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
 onprem-tez2:45454
 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
 QUERY_CONTAINER for Container 

[jira] [Commented] (YARN-2821) Distributed shell app master becomes unresponsive sometimes

2015-05-18 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549099#comment-14549099
 ] 

Jian He commented on YARN-2821:
---

thanks Varun ! looks good overall.
test case - for below, we need to test that if AM receives any unknown 
completed container, the numCompletedContainers still equals to the  
numTotalContainers
{code}
// ignore containers we know nothing about - probably from a previous
// attempt
if (!launchedContainers.contains(containerStatus.getContainerId())) {
  LOG.info(Ignoring completed status of 
  + containerStatus.getContainerId()
  + ; unknown container(probably launched by previous attempt));
  continue;
}
{code}

 Distributed shell app master becomes unresponsive sometimes
 ---

 Key: YARN-2821
 URL: https://issues.apache.org/jira/browse/YARN-2821
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Affects Versions: 2.5.1
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: YARN-2821.002.patch, YARN-2821.003.patch, 
 YARN-2821.004.patch, apache-yarn-2821.0.patch, apache-yarn-2821.1.patch


 We've noticed that once in a while the distributed shell app master becomes 
 unresponsive and is eventually killed by the RM. snippet of the logs -
 {noformat}
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: 
 appattempt_1415123350094_0017_01 received 0 previous attempts' running 
 containers on AM registration.
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : 
 onprem-tez2:45454
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from 
 RM for container ask, allocatedCnt=1
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell 
 command on a new container., 
 containerId=container_1415123350094_0017_01_02, 
 containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, 
 containerResourceMemory1024, containerResourceVirtualCores1
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up 
 container launch container for 
 containerid=container_1415123350094_0017_01_02
 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
 START_CONTAINER for Container container_1415123350094_0017_01_02
 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
 onprem-tez2:45454
 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
 QUERY_CONTAINER for Container container_1415123350094_0017_01_02
 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
 onprem-tez2:45454
 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : 
 onprem-tez3:45454
 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : 
 onprem-tez4:45454
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Got response from 
 RM for container ask, allocatedCnt=3
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell 
 command on a new container., 
 containerId=container_1415123350094_0017_01_03, 
 containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, 
 containerResourceMemory1024, containerResourceVirtualCores1
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell 
 command on a new container., 
 containerId=container_1415123350094_0017_01_04, 
 containerNode=onprem-tez3:45454, containerNodeURI=onprem-tez3:50060, 
 containerResourceMemory1024, containerResourceVirtualCores1
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell 
 command on a new container., 
 containerId=container_1415123350094_0017_01_05, 
 containerNode=onprem-tez4:45454, containerNodeURI=onprem-tez4:50060, 
 containerResourceMemory1024, containerResourceVirtualCores1
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Setting up 
 container launch container for 
 containerid=container_1415123350094_0017_01_03
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Setting up 
 container launch container for 
 

[jira] [Commented] (YARN-2821) Distributed shell app master becomes unresponsive sometimes

2015-05-13 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14543206#comment-14543206
 ] 

Jian He commented on YARN-2821:
---

The current patch makes sense because there's no way to figure out previously 
finished apps other than persisting.  But I'm thinking if it is a bit over-kill 
to do this for an example app. One thing in my mind is that, inside the 
onContainersCompleted we can filter out the previously attempts' finished 
containers and only do it for current attempt's finished containers. And in the 
finish() method, we compare 
numFinishedContainersOfcurrentAttempt==numContainersAskedByCurrentAttempt; will 
this work ? I know doing this, the total finished containers may be larger than 
user specified, but given this is just an example app, maybe tolerable ? 

On the other hand, the current patch may not work on secure cluster because it 
communicates with hdfs. {{renameScriptFile}} method is an example to talk with 
hdfs.

 Distributed shell app master becomes unresponsive sometimes
 ---

 Key: YARN-2821
 URL: https://issues.apache.org/jira/browse/YARN-2821
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Affects Versions: 2.5.1
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: YARN-2821.002.patch, YARN-2821.003.patch, 
 apache-yarn-2821.0.patch, apache-yarn-2821.1.patch


 We've noticed that once in a while the distributed shell app master becomes 
 unresponsive and is eventually killed by the RM. snippet of the logs -
 {noformat}
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: 
 appattempt_1415123350094_0017_01 received 0 previous attempts' running 
 containers on AM registration.
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : 
 onprem-tez2:45454
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from 
 RM for container ask, allocatedCnt=1
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell 
 command on a new container., 
 containerId=container_1415123350094_0017_01_02, 
 containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, 
 containerResourceMemory1024, containerResourceVirtualCores1
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up 
 container launch container for 
 containerid=container_1415123350094_0017_01_02
 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
 START_CONTAINER for Container container_1415123350094_0017_01_02
 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
 onprem-tez2:45454
 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
 QUERY_CONTAINER for Container container_1415123350094_0017_01_02
 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
 onprem-tez2:45454
 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : 
 onprem-tez3:45454
 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : 
 onprem-tez4:45454
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Got response from 
 RM for container ask, allocatedCnt=3
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell 
 command on a new container., 
 containerId=container_1415123350094_0017_01_03, 
 containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, 
 containerResourceMemory1024, containerResourceVirtualCores1
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell 
 command on a new container., 
 containerId=container_1415123350094_0017_01_04, 
 containerNode=onprem-tez3:45454, containerNodeURI=onprem-tez3:50060, 
 containerResourceMemory1024, containerResourceVirtualCores1
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell 
 command on a new container., 
 containerId=container_1415123350094_0017_01_05, 
 containerNode=onprem-tez4:45454, containerNodeURI=onprem-tez4:50060, 
 containerResourceMemory1024, containerResourceVirtualCores1
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Setting up 

[jira] [Commented] (YARN-2821) Distributed shell app master becomes unresponsive sometimes

2015-05-08 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534637#comment-14534637
 ] 

Varun Vasudev commented on YARN-2821:
-

[~jianhe] - can you please review? Thanks!

 Distributed shell app master becomes unresponsive sometimes
 ---

 Key: YARN-2821
 URL: https://issues.apache.org/jira/browse/YARN-2821
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Affects Versions: 2.5.1
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: YARN-2821.002.patch, YARN-2821.003.patch, 
 apache-yarn-2821.0.patch, apache-yarn-2821.1.patch


 We've noticed that once in a while the distributed shell app master becomes 
 unresponsive and is eventually killed by the RM. snippet of the logs -
 {noformat}
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: 
 appattempt_1415123350094_0017_01 received 0 previous attempts' running 
 containers on AM registration.
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : 
 onprem-tez2:45454
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from 
 RM for container ask, allocatedCnt=1
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell 
 command on a new container., 
 containerId=container_1415123350094_0017_01_02, 
 containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, 
 containerResourceMemory1024, containerResourceVirtualCores1
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up 
 container launch container for 
 containerid=container_1415123350094_0017_01_02
 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
 START_CONTAINER for Container container_1415123350094_0017_01_02
 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
 onprem-tez2:45454
 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
 QUERY_CONTAINER for Container container_1415123350094_0017_01_02
 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
 onprem-tez2:45454
 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : 
 onprem-tez3:45454
 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : 
 onprem-tez4:45454
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Got response from 
 RM for container ask, allocatedCnt=3
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell 
 command on a new container., 
 containerId=container_1415123350094_0017_01_03, 
 containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, 
 containerResourceMemory1024, containerResourceVirtualCores1
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell 
 command on a new container., 
 containerId=container_1415123350094_0017_01_04, 
 containerNode=onprem-tez3:45454, containerNodeURI=onprem-tez3:50060, 
 containerResourceMemory1024, containerResourceVirtualCores1
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell 
 command on a new container., 
 containerId=container_1415123350094_0017_01_05, 
 containerNode=onprem-tez4:45454, containerNodeURI=onprem-tez4:50060, 
 containerResourceMemory1024, containerResourceVirtualCores1
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Setting up 
 container launch container for 
 containerid=container_1415123350094_0017_01_03
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Setting up 
 container launch container for 
 containerid=container_1415123350094_0017_01_05
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Setting up 
 container launch container for 
 containerid=container_1415123350094_0017_01_04
 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
 START_CONTAINER for Container container_1415123350094_0017_01_05
 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
 START_CONTAINER for Container container_1415123350094_0017_01_03
 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening 

[jira] [Commented] (YARN-2821) Distributed shell app master becomes unresponsive sometimes

2015-05-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534626#comment-14534626
 ] 

Hadoop QA commented on YARN-2821:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  15m  4s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 46s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 54s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 23s | The applied patch generated  2 
new checkstyle issues (total was 150, now 150). |
| {color:green}+1{color} | whitespace |   0m  3s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 37s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 37s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   7m  2s | Tests passed in 
hadoop-yarn-applications-distributedshell. |
| | |  43m 25s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12731438/YARN-2821.003.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 241a72a |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/7806/artifact/patchprocess/diffcheckstylehadoop-yarn-applications-distributedshell.txt
 |
| hadoop-yarn-applications-distributedshell test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7806/artifact/patchprocess/testrun_hadoop-yarn-applications-distributedshell.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7806/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7806/console |


This message was automatically generated.

 Distributed shell app master becomes unresponsive sometimes
 ---

 Key: YARN-2821
 URL: https://issues.apache.org/jira/browse/YARN-2821
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Affects Versions: 2.5.1
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: YARN-2821.002.patch, YARN-2821.003.patch, 
 apache-yarn-2821.0.patch, apache-yarn-2821.1.patch


 We've noticed that once in a while the distributed shell app master becomes 
 unresponsive and is eventually killed by the RM. snippet of the logs -
 {noformat}
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: 
 appattempt_1415123350094_0017_01 received 0 previous attempts' running 
 containers on AM registration.
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : 
 onprem-tez2:45454
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from 
 RM for container ask, allocatedCnt=1
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell 
 command on a new container., 
 containerId=container_1415123350094_0017_01_02, 
 containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, 
 containerResourceMemory1024, containerResourceVirtualCores1
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up 
 container launch container for 
 containerid=container_1415123350094_0017_01_02
 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
 START_CONTAINER for Container container_1415123350094_0017_01_02
 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: 

[jira] [Commented] (YARN-2821) Distributed shell app master becomes unresponsive sometimes

2015-05-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533233#comment-14533233
 ] 

Hadoop QA commented on YARN-2821:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 49s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 35s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 37s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 22s | The applied patch generated  6 
new checkstyle issues (total was 151, now 155). |
| {color:green}+1{color} | whitespace |   0m  2s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 35s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:red}-1{color} | yarn tests |  15m 25s | Tests failed in 
hadoop-yarn-applications-distributedshell. |
| | |  50m 56s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.applications.distributedshell.TestDistributedShellWithNodeLabels |
| Timed out tests | 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12731235/YARN-2821.002.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / daf3e4e |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/7770/artifact/patchprocess/diffcheckstylehadoop-yarn-applications-distributedshell.txt
 |
| hadoop-yarn-applications-distributedshell test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7770/artifact/patchprocess/testrun_hadoop-yarn-applications-distributedshell.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7770/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7770/console |


This message was automatically generated.

 Distributed shell app master becomes unresponsive sometimes
 ---

 Key: YARN-2821
 URL: https://issues.apache.org/jira/browse/YARN-2821
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Affects Versions: 2.5.1
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: YARN-2821.002.patch, apache-yarn-2821.0.patch, 
 apache-yarn-2821.1.patch


 We've noticed that once in a while the distributed shell app master becomes 
 unresponsive and is eventually killed by the RM. snippet of the logs -
 {noformat}
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: 
 appattempt_1415123350094_0017_01 received 0 previous attempts' running 
 containers on AM registration.
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : 
 onprem-tez2:45454
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from 
 RM for container ask, allocatedCnt=1
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell 
 command on a new container., 
 containerId=container_1415123350094_0017_01_02, 
 containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, 
 containerResourceMemory1024, containerResourceVirtualCores1
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up 
 container launch container for 
 containerid=container_1415123350094_0017_01_02
 14/11/04 

[jira] [Commented] (YARN-2821) Distributed shell app master becomes unresponsive sometimes

2015-05-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525991#comment-14525991
 ] 

Hadoop QA commented on YARN-2821:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 37s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  1s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 35s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 32s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 20s | The applied patch generated  3 
new checkstyle issues (total was 47, now 50). |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 32s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 36s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   7m  1s | Tests passed in 
hadoop-yarn-applications-distributedshell. |
| | |  42m 14s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12680098/apache-yarn-2821.1.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / a319771 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/7675/artifact/patchprocess/diffcheckstylehadoop-yarn-applications-distributedshell.txt
 |
| hadoop-yarn-applications-distributedshell test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7675/artifact/patchprocess/testrun_hadoop-yarn-applications-distributedshell.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7675/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7675/console |


This message was automatically generated.

 Distributed shell app master becomes unresponsive sometimes
 ---

 Key: YARN-2821
 URL: https://issues.apache.org/jira/browse/YARN-2821
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Affects Versions: 2.5.1
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2821.0.patch, apache-yarn-2821.1.patch


 We've noticed that once in a while the distributed shell app master becomes 
 unresponsive and is eventually killed by the RM. snippet of the logs -
 {noformat}
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: 
 appattempt_1415123350094_0017_01 received 0 previous attempts' running 
 containers on AM registration.
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : 
 onprem-tez2:45454
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from 
 RM for container ask, allocatedCnt=1
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell 
 command on a new container., 
 containerId=container_1415123350094_0017_01_02, 
 containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, 
 containerResourceMemory1024, containerResourceVirtualCores1
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up 
 container launch container for 
 containerid=container_1415123350094_0017_01_02
 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
 START_CONTAINER for Container container_1415123350094_0017_01_02
 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
 onprem-tez2:45454
 

[jira] [Commented] (YARN-2821) Distributed shell app master becomes unresponsive sometimes

2015-05-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525660#comment-14525660
 ] 

Hadoop QA commented on YARN-2821:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 44s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 33s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 33s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 22s | The applied patch generated  3 
new checkstyle issues (total was 46, now 49). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 36s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   6m 58s | Tests passed in 
hadoop-yarn-applications-distributedshell. |
| | |  42m 18s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12680098/apache-yarn-2821.1.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / e8d0ee5 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/7669/artifact/patchprocess/diffcheckstylehadoop-yarn-applications-distributedshell.txt
 |
| hadoop-yarn-applications-distributedshell test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7669/artifact/patchprocess/testrun_hadoop-yarn-applications-distributedshell.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7669/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7669/console |


This message was automatically generated.

 Distributed shell app master becomes unresponsive sometimes
 ---

 Key: YARN-2821
 URL: https://issues.apache.org/jira/browse/YARN-2821
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Affects Versions: 2.5.1
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2821.0.patch, apache-yarn-2821.1.patch


 We've noticed that once in a while the distributed shell app master becomes 
 unresponsive and is eventually killed by the RM. snippet of the logs -
 {noformat}
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: 
 appattempt_1415123350094_0017_01 received 0 previous attempts' running 
 containers on AM registration.
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : 
 onprem-tez2:45454
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from 
 RM for container ask, allocatedCnt=1
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell 
 command on a new container., 
 containerId=container_1415123350094_0017_01_02, 
 containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, 
 containerResourceMemory1024, containerResourceVirtualCores1
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up 
 container launch container for 
 containerid=container_1415123350094_0017_01_02
 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
 START_CONTAINER for Container container_1415123350094_0017_01_02
 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
 onprem-tez2:45454
 

[jira] [Commented] (YARN-2821) Distributed shell app master becomes unresponsive sometimes

2014-11-07 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202949#comment-14202949
 ] 

Jian He commented on YARN-2821:
---

- we can use {{Collections.synchronizedSet(new HashSet());}} instead of 
concurrentHashMap
- entry in releasedContainers once added , never removed.
-  found that actually in this case, the container will expire and the exit 
status is ABORTED, numCompletedContainers will not be incremented in 
onContainersCompleted
{code}
// no need to update numCompletedContainers because we'll get
  // a notification from the RM anyway and we'll handle it there
{code}

 Distributed shell app master becomes unresponsive sometimes
 ---

 Key: YARN-2821
 URL: https://issues.apache.org/jira/browse/YARN-2821
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Affects Versions: 2.5.1
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2821.0.patch, apache-yarn-2821.1.patch


 We've noticed that once in a while the distributed shell app master becomes 
 unresponsive and is eventually killed by the RM. snippet of the logs -
 {noformat}
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: 
 appattempt_1415123350094_0017_01 received 0 previous attempts' running 
 containers on AM registration.
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : 
 onprem-tez2:45454
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from 
 RM for container ask, allocatedCnt=1
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell 
 command on a new container., 
 containerId=container_1415123350094_0017_01_02, 
 containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, 
 containerResourceMemory1024, containerResourceVirtualCores1
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up 
 container launch container for 
 containerid=container_1415123350094_0017_01_02
 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
 START_CONTAINER for Container container_1415123350094_0017_01_02
 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
 onprem-tez2:45454
 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
 QUERY_CONTAINER for Container container_1415123350094_0017_01_02
 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
 onprem-tez2:45454
 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : 
 onprem-tez3:45454
 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : 
 onprem-tez4:45454
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Got response from 
 RM for container ask, allocatedCnt=3
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell 
 command on a new container., 
 containerId=container_1415123350094_0017_01_03, 
 containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, 
 containerResourceMemory1024, containerResourceVirtualCores1
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell 
 command on a new container., 
 containerId=container_1415123350094_0017_01_04, 
 containerNode=onprem-tez3:45454, containerNodeURI=onprem-tez3:50060, 
 containerResourceMemory1024, containerResourceVirtualCores1
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell 
 command on a new container., 
 containerId=container_1415123350094_0017_01_05, 
 containerNode=onprem-tez4:45454, containerNodeURI=onprem-tez4:50060, 
 containerResourceMemory1024, containerResourceVirtualCores1
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Setting up 
 container launch container for 
 containerid=container_1415123350094_0017_01_03
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Setting up 
 container launch container for 
 containerid=container_1415123350094_0017_01_05
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Setting up 
 container launch container for 
 

[jira] [Commented] (YARN-2821) Distributed shell app master becomes unresponsive sometimes

2014-11-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200997#comment-14200997
 ] 

Hadoop QA commented on YARN-2821:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12679948/apache-yarn-2821.0.patch
  against trunk revision 1670578.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5757//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5757//console

This message is automatically generated.

 Distributed shell app master becomes unresponsive sometimes
 ---

 Key: YARN-2821
 URL: https://issues.apache.org/jira/browse/YARN-2821
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Affects Versions: 2.5.1
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2821.0.patch


 We've noticed that once in a while the distributed shell app master becomes 
 unresponsive and is eventually killed by the RM. snippet of the logs -
 {noformat}
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: 
 appattempt_1415123350094_0017_01 received 0 previous attempts' running 
 containers on AM registration.
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : 
 onprem-tez2:45454
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from 
 RM for container ask, allocatedCnt=1
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell 
 command on a new container., 
 containerId=container_1415123350094_0017_01_02, 
 containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, 
 containerResourceMemory1024, containerResourceVirtualCores1
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up 
 container launch container for 
 containerid=container_1415123350094_0017_01_02
 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
 START_CONTAINER for Container container_1415123350094_0017_01_02
 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
 onprem-tez2:45454
 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
 QUERY_CONTAINER for Container container_1415123350094_0017_01_02
 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
 onprem-tez2:45454
 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : 
 onprem-tez3:45454
 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : 
 onprem-tez4:45454
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Got response from 
 RM for container ask, allocatedCnt=3
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell 
 command on a new container., 
 containerId=container_1415123350094_0017_01_03, 
 containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, 
 containerResourceMemory1024, containerResourceVirtualCores1
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell 
 command on a new container., 
 containerId=container_1415123350094_0017_01_04, 
 

[jira] [Commented] (YARN-2821) Distributed shell app master becomes unresponsive sometimes

2014-11-06 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14201183#comment-14201183
 ] 

Jian He commented on YARN-2821:
---

thanks Varun ! the patch should solve the problem to release extra allocated 
containers. but  seems in some cases, we  may still run into the infinite loop.
In finish(), it's checking {{numCompletedContainers.get() != 
numTotalContainers}}, but numCompletedContainers could be incremented 
elsewhere.  e.g. {{onStartContainerError}} will also increase the 
numCompletedContainers count. Could the numCompletedContainers go beyond the 
numTotalContainers in such scenario also ?

 Distributed shell app master becomes unresponsive sometimes
 ---

 Key: YARN-2821
 URL: https://issues.apache.org/jira/browse/YARN-2821
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Affects Versions: 2.5.1
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2821.0.patch


 We've noticed that once in a while the distributed shell app master becomes 
 unresponsive and is eventually killed by the RM. snippet of the logs -
 {noformat}
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: 
 appattempt_1415123350094_0017_01 received 0 previous attempts' running 
 containers on AM registration.
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : 
 onprem-tez2:45454
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from 
 RM for container ask, allocatedCnt=1
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell 
 command on a new container., 
 containerId=container_1415123350094_0017_01_02, 
 containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, 
 containerResourceMemory1024, containerResourceVirtualCores1
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up 
 container launch container for 
 containerid=container_1415123350094_0017_01_02
 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
 START_CONTAINER for Container container_1415123350094_0017_01_02
 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
 onprem-tez2:45454
 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
 QUERY_CONTAINER for Container container_1415123350094_0017_01_02
 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
 onprem-tez2:45454
 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : 
 onprem-tez3:45454
 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : 
 onprem-tez4:45454
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Got response from 
 RM for container ask, allocatedCnt=3
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell 
 command on a new container., 
 containerId=container_1415123350094_0017_01_03, 
 containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, 
 containerResourceMemory1024, containerResourceVirtualCores1
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell 
 command on a new container., 
 containerId=container_1415123350094_0017_01_04, 
 containerNode=onprem-tez3:45454, containerNodeURI=onprem-tez3:50060, 
 containerResourceMemory1024, containerResourceVirtualCores1
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell 
 command on a new container., 
 containerId=container_1415123350094_0017_01_05, 
 containerNode=onprem-tez4:45454, containerNodeURI=onprem-tez4:50060, 
 containerResourceMemory1024, containerResourceVirtualCores1
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Setting up 
 container launch container for 
 containerid=container_1415123350094_0017_01_03
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Setting up 
 container launch container for 
 containerid=container_1415123350094_0017_01_05
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Setting up 
 container launch container for 
 containerid=container_1415123350094_0017_01_04
 14/11/04 

[jira] [Commented] (YARN-2821) Distributed shell app master becomes unresponsive sometimes

2014-11-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14201681#comment-14201681
 ] 

Hadoop QA commented on YARN-2821:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12680098/apache-yarn-2821.1.patch
  against trunk revision 61effcb.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell:

  
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5777//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5777//console

This message is automatically generated.

 Distributed shell app master becomes unresponsive sometimes
 ---

 Key: YARN-2821
 URL: https://issues.apache.org/jira/browse/YARN-2821
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Affects Versions: 2.5.1
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2821.0.patch, apache-yarn-2821.1.patch


 We've noticed that once in a while the distributed shell app master becomes 
 unresponsive and is eventually killed by the RM. snippet of the logs -
 {noformat}
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: 
 appattempt_1415123350094_0017_01 received 0 previous attempts' running 
 containers on AM registration.
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
 container ask: Capability[memory:10, vCores:1]Priority[0]
 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : 
 onprem-tez2:45454
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from 
 RM for container ask, allocatedCnt=1
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell 
 command on a new container., 
 containerId=container_1415123350094_0017_01_02, 
 containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, 
 containerResourceMemory1024, containerResourceVirtualCores1
 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up 
 container launch container for 
 containerid=container_1415123350094_0017_01_02
 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
 START_CONTAINER for Container container_1415123350094_0017_01_02
 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
 onprem-tez2:45454
 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
 QUERY_CONTAINER for Container container_1415123350094_0017_01_02
 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
 onprem-tez2:45454
 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : 
 onprem-tez3:45454
 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : 
 onprem-tez4:45454
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Got response from 
 RM for container ask, allocatedCnt=3
 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell 
 command on a new container., 
 containerId=container_1415123350094_0017_01_03, 
 containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, 
 containerResourceMemory1024, containerResourceVirtualCores1
 14/11/04 18:21:39 INFO