[jira] [Commented] (YARN-933) Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING

2015-02-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328848#comment-14328848
 ] 

Hudson commented on YARN-933:
-

SUCCESS: Integrated in Hadoop-Yarn-trunk #844 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/844/])
YARN-933. Fixed InvalidStateTransitonException at FINAL_SAVING state in RMApp. 
Contributed by Rohith Sharmaks (jianhe: rev 
c0d9b93953767608dfe429ddb9bd4c1c3bd3debf)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java
* hadoop-yarn-project/CHANGES.txt


 Potential InvalidStateTransitonException: Invalid event: LAUNCHED at 
 FINAL_SAVING
 -

 Key: YARN-933
 URL: https://issues.apache.org/jira/browse/YARN-933
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.5-alpha
Reporter: J.Andreina
Assignee: Rohith
 Fix For: 2.7.0

 Attachments: 0001-YARN-933.patch, 0001-YARN-933.patch, 
 0004-YARN-933.patch, YARN-933.3.patch, YARN-933.patch


 am max retries configured as 3 at client and RM side.
 Step 1: Install cluster with NM on 2 Machines 
 Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But 
 using Hostname should fail
 Step 3: Execute a job
 Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , 
 connection loss happened.
 Observation :
 ==
 After AppAttempt_1 has moved to failed state ,release of container for 
 AppAttempt_1 and Application removal are successful. New AppAttempt_2 is 
 sponed.
 1. Then again retry for AppAttempt_1 happens.
 2. Again RM side it is trying to launch AppAttempt_1, hence fails with 
 InvalidStateTransitonException
 3. Client got exited after AppAttempt_1 is been finished [But actually job is 
 still running ], while the appattempts configured is 3 and rest appattempts 
 are all sponed and running.
 RMLogs:
 ==
 2013-07-17 16:22:51,013 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1373952096466_0056_01 State change from SCHEDULED to ALLOCATED
 2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); 
 maxRetries=45
 2013-07-17 16:36:07,091 INFO 
 org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: 
 Expired:container_1373952096466_0056_01_01 Timed out after 600 secs
 2013-07-17 16:36:07,093 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 container_1373952096466_0056_01_01 Container Transitioned from ACQUIRED 
 to EXPIRED
 2013-07-17 16:36:07,093 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
 Registering appattempt_1373952096466_0056_02
 2013-07-17 16:36:07,131 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  Application appattempt_1373952096466_0056_01 is done. finalState=FAILED
 2013-07-17 16:36:07,131 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
 Application removed - appId: application_1373952096466_0056 user: Rex 
 leaf-queue of parent: root #applications: 35
 2013-07-17 16:36:07,132 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  Application Submission: appattempt_1373952096466_0056_02, 
 2013-07-17 16:36:07,138 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1373952096466_0056_02 State change from SUBMITTED to SCHEDULED
 2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 38 time(s); 
 maxRetries=45
 2013-07-17 16:38:36,203 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 44 time(s); 
 maxRetries=45
 2013-07-17 16:38:56,207 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error 
 launching appattempt_1373952096466_0056_01. Got exception: 
 java.lang.reflect.UndeclaredThrowableException
 2013-07-17 16:38:56,207 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 LAUNCH_FAILED at FAILED
  at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
  at 
 

[jira] [Commented] (YARN-933) Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING

2015-02-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328826#comment-14328826
 ] 

Hudson commented on YARN-933:
-

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #110 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/110/])
YARN-933. Fixed InvalidStateTransitonException at FINAL_SAVING state in RMApp. 
Contributed by Rohith Sharmaks (jianhe: rev 
c0d9b93953767608dfe429ddb9bd4c1c3bd3debf)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java
* hadoop-yarn-project/CHANGES.txt


 Potential InvalidStateTransitonException: Invalid event: LAUNCHED at 
 FINAL_SAVING
 -

 Key: YARN-933
 URL: https://issues.apache.org/jira/browse/YARN-933
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.5-alpha
Reporter: J.Andreina
Assignee: Rohith
 Fix For: 2.7.0

 Attachments: 0001-YARN-933.patch, 0001-YARN-933.patch, 
 0004-YARN-933.patch, YARN-933.3.patch, YARN-933.patch


 am max retries configured as 3 at client and RM side.
 Step 1: Install cluster with NM on 2 Machines 
 Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But 
 using Hostname should fail
 Step 3: Execute a job
 Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , 
 connection loss happened.
 Observation :
 ==
 After AppAttempt_1 has moved to failed state ,release of container for 
 AppAttempt_1 and Application removal are successful. New AppAttempt_2 is 
 sponed.
 1. Then again retry for AppAttempt_1 happens.
 2. Again RM side it is trying to launch AppAttempt_1, hence fails with 
 InvalidStateTransitonException
 3. Client got exited after AppAttempt_1 is been finished [But actually job is 
 still running ], while the appattempts configured is 3 and rest appattempts 
 are all sponed and running.
 RMLogs:
 ==
 2013-07-17 16:22:51,013 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1373952096466_0056_01 State change from SCHEDULED to ALLOCATED
 2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); 
 maxRetries=45
 2013-07-17 16:36:07,091 INFO 
 org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: 
 Expired:container_1373952096466_0056_01_01 Timed out after 600 secs
 2013-07-17 16:36:07,093 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 container_1373952096466_0056_01_01 Container Transitioned from ACQUIRED 
 to EXPIRED
 2013-07-17 16:36:07,093 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
 Registering appattempt_1373952096466_0056_02
 2013-07-17 16:36:07,131 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  Application appattempt_1373952096466_0056_01 is done. finalState=FAILED
 2013-07-17 16:36:07,131 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
 Application removed - appId: application_1373952096466_0056 user: Rex 
 leaf-queue of parent: root #applications: 35
 2013-07-17 16:36:07,132 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  Application Submission: appattempt_1373952096466_0056_02, 
 2013-07-17 16:36:07,138 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1373952096466_0056_02 State change from SUBMITTED to SCHEDULED
 2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 38 time(s); 
 maxRetries=45
 2013-07-17 16:38:36,203 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 44 time(s); 
 maxRetries=45
 2013-07-17 16:38:56,207 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error 
 launching appattempt_1373952096466_0056_01. Got exception: 
 java.lang.reflect.UndeclaredThrowableException
 2013-07-17 16:38:56,207 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 LAUNCH_FAILED at FAILED
  at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)

[jira] [Commented] (YARN-933) Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING

2015-02-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328948#comment-14328948
 ] 

Hudson commented on YARN-933:
-

FAILURE: Integrated in Hadoop-Hdfs-trunk #2042 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2042/])
YARN-933. Fixed InvalidStateTransitonException at FINAL_SAVING state in RMApp. 
Contributed by Rohith Sharmaks (jianhe: rev 
c0d9b93953767608dfe429ddb9bd4c1c3bd3debf)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* hadoop-yarn-project/CHANGES.txt


 Potential InvalidStateTransitonException: Invalid event: LAUNCHED at 
 FINAL_SAVING
 -

 Key: YARN-933
 URL: https://issues.apache.org/jira/browse/YARN-933
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.5-alpha
Reporter: J.Andreina
Assignee: Rohith
 Fix For: 2.7.0

 Attachments: 0001-YARN-933.patch, 0001-YARN-933.patch, 
 0004-YARN-933.patch, YARN-933.3.patch, YARN-933.patch


 am max retries configured as 3 at client and RM side.
 Step 1: Install cluster with NM on 2 Machines 
 Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But 
 using Hostname should fail
 Step 3: Execute a job
 Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , 
 connection loss happened.
 Observation :
 ==
 After AppAttempt_1 has moved to failed state ,release of container for 
 AppAttempt_1 and Application removal are successful. New AppAttempt_2 is 
 sponed.
 1. Then again retry for AppAttempt_1 happens.
 2. Again RM side it is trying to launch AppAttempt_1, hence fails with 
 InvalidStateTransitonException
 3. Client got exited after AppAttempt_1 is been finished [But actually job is 
 still running ], while the appattempts configured is 3 and rest appattempts 
 are all sponed and running.
 RMLogs:
 ==
 2013-07-17 16:22:51,013 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1373952096466_0056_01 State change from SCHEDULED to ALLOCATED
 2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); 
 maxRetries=45
 2013-07-17 16:36:07,091 INFO 
 org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: 
 Expired:container_1373952096466_0056_01_01 Timed out after 600 secs
 2013-07-17 16:36:07,093 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 container_1373952096466_0056_01_01 Container Transitioned from ACQUIRED 
 to EXPIRED
 2013-07-17 16:36:07,093 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
 Registering appattempt_1373952096466_0056_02
 2013-07-17 16:36:07,131 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  Application appattempt_1373952096466_0056_01 is done. finalState=FAILED
 2013-07-17 16:36:07,131 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
 Application removed - appId: application_1373952096466_0056 user: Rex 
 leaf-queue of parent: root #applications: 35
 2013-07-17 16:36:07,132 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  Application Submission: appattempt_1373952096466_0056_02, 
 2013-07-17 16:36:07,138 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1373952096466_0056_02 State change from SUBMITTED to SCHEDULED
 2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 38 time(s); 
 maxRetries=45
 2013-07-17 16:38:36,203 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 44 time(s); 
 maxRetries=45
 2013-07-17 16:38:56,207 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error 
 launching appattempt_1373952096466_0056_01. Got exception: 
 java.lang.reflect.UndeclaredThrowableException
 2013-07-17 16:38:56,207 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 LAUNCH_FAILED at FAILED
  at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
  at 
 

[jira] [Commented] (YARN-933) Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING

2015-02-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328954#comment-14328954
 ] 

Hudson commented on YARN-933:
-

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #101 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/101/])
YARN-933. Fixed InvalidStateTransitonException at FINAL_SAVING state in RMApp. 
Contributed by Rohith Sharmaks (jianhe: rev 
c0d9b93953767608dfe429ddb9bd4c1c3bd3debf)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java


 Potential InvalidStateTransitonException: Invalid event: LAUNCHED at 
 FINAL_SAVING
 -

 Key: YARN-933
 URL: https://issues.apache.org/jira/browse/YARN-933
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.5-alpha
Reporter: J.Andreina
Assignee: Rohith
 Fix For: 2.7.0

 Attachments: 0001-YARN-933.patch, 0001-YARN-933.patch, 
 0004-YARN-933.patch, YARN-933.3.patch, YARN-933.patch


 am max retries configured as 3 at client and RM side.
 Step 1: Install cluster with NM on 2 Machines 
 Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But 
 using Hostname should fail
 Step 3: Execute a job
 Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , 
 connection loss happened.
 Observation :
 ==
 After AppAttempt_1 has moved to failed state ,release of container for 
 AppAttempt_1 and Application removal are successful. New AppAttempt_2 is 
 sponed.
 1. Then again retry for AppAttempt_1 happens.
 2. Again RM side it is trying to launch AppAttempt_1, hence fails with 
 InvalidStateTransitonException
 3. Client got exited after AppAttempt_1 is been finished [But actually job is 
 still running ], while the appattempts configured is 3 and rest appattempts 
 are all sponed and running.
 RMLogs:
 ==
 2013-07-17 16:22:51,013 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1373952096466_0056_01 State change from SCHEDULED to ALLOCATED
 2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); 
 maxRetries=45
 2013-07-17 16:36:07,091 INFO 
 org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: 
 Expired:container_1373952096466_0056_01_01 Timed out after 600 secs
 2013-07-17 16:36:07,093 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 container_1373952096466_0056_01_01 Container Transitioned from ACQUIRED 
 to EXPIRED
 2013-07-17 16:36:07,093 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
 Registering appattempt_1373952096466_0056_02
 2013-07-17 16:36:07,131 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  Application appattempt_1373952096466_0056_01 is done. finalState=FAILED
 2013-07-17 16:36:07,131 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
 Application removed - appId: application_1373952096466_0056 user: Rex 
 leaf-queue of parent: root #applications: 35
 2013-07-17 16:36:07,132 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  Application Submission: appattempt_1373952096466_0056_02, 
 2013-07-17 16:36:07,138 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1373952096466_0056_02 State change from SUBMITTED to SCHEDULED
 2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 38 time(s); 
 maxRetries=45
 2013-07-17 16:38:36,203 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 44 time(s); 
 maxRetries=45
 2013-07-17 16:38:56,207 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error 
 launching appattempt_1373952096466_0056_01. Got exception: 
 java.lang.reflect.UndeclaredThrowableException
 2013-07-17 16:38:56,207 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 LAUNCH_FAILED at FAILED
  at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)

[jira] [Commented] (YARN-933) Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING

2015-02-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329020#comment-14329020
 ] 

Hudson commented on YARN-933:
-

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #111 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/111/])
YARN-933. Fixed InvalidStateTransitonException at FINAL_SAVING state in RMApp. 
Contributed by Rohith Sharmaks (jianhe: rev 
c0d9b93953767608dfe429ddb9bd4c1c3bd3debf)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java


 Potential InvalidStateTransitonException: Invalid event: LAUNCHED at 
 FINAL_SAVING
 -

 Key: YARN-933
 URL: https://issues.apache.org/jira/browse/YARN-933
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.5-alpha
Reporter: J.Andreina
Assignee: Rohith
 Fix For: 2.7.0

 Attachments: 0001-YARN-933.patch, 0001-YARN-933.patch, 
 0004-YARN-933.patch, YARN-933.3.patch, YARN-933.patch


 am max retries configured as 3 at client and RM side.
 Step 1: Install cluster with NM on 2 Machines 
 Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But 
 using Hostname should fail
 Step 3: Execute a job
 Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , 
 connection loss happened.
 Observation :
 ==
 After AppAttempt_1 has moved to failed state ,release of container for 
 AppAttempt_1 and Application removal are successful. New AppAttempt_2 is 
 sponed.
 1. Then again retry for AppAttempt_1 happens.
 2. Again RM side it is trying to launch AppAttempt_1, hence fails with 
 InvalidStateTransitonException
 3. Client got exited after AppAttempt_1 is been finished [But actually job is 
 still running ], while the appattempts configured is 3 and rest appattempts 
 are all sponed and running.
 RMLogs:
 ==
 2013-07-17 16:22:51,013 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1373952096466_0056_01 State change from SCHEDULED to ALLOCATED
 2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); 
 maxRetries=45
 2013-07-17 16:36:07,091 INFO 
 org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: 
 Expired:container_1373952096466_0056_01_01 Timed out after 600 secs
 2013-07-17 16:36:07,093 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 container_1373952096466_0056_01_01 Container Transitioned from ACQUIRED 
 to EXPIRED
 2013-07-17 16:36:07,093 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
 Registering appattempt_1373952096466_0056_02
 2013-07-17 16:36:07,131 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  Application appattempt_1373952096466_0056_01 is done. finalState=FAILED
 2013-07-17 16:36:07,131 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
 Application removed - appId: application_1373952096466_0056 user: Rex 
 leaf-queue of parent: root #applications: 35
 2013-07-17 16:36:07,132 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  Application Submission: appattempt_1373952096466_0056_02, 
 2013-07-17 16:36:07,138 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1373952096466_0056_02 State change from SUBMITTED to SCHEDULED
 2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 38 time(s); 
 maxRetries=45
 2013-07-17 16:38:36,203 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 44 time(s); 
 maxRetries=45
 2013-07-17 16:38:56,207 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error 
 launching appattempt_1373952096466_0056_01. Got exception: 
 java.lang.reflect.UndeclaredThrowableException
 2013-07-17 16:38:56,207 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 LAUNCH_FAILED at FAILED
  at 
 

[jira] [Commented] (YARN-933) Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING

2015-02-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329050#comment-14329050
 ] 

Hudson commented on YARN-933:
-

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2061 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2061/])
YARN-933. Fixed InvalidStateTransitonException at FINAL_SAVING state in RMApp. 
Contributed by Rohith Sharmaks (jianhe: rev 
c0d9b93953767608dfe429ddb9bd4c1c3bd3debf)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java


 Potential InvalidStateTransitonException: Invalid event: LAUNCHED at 
 FINAL_SAVING
 -

 Key: YARN-933
 URL: https://issues.apache.org/jira/browse/YARN-933
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.5-alpha
Reporter: J.Andreina
Assignee: Rohith
 Fix For: 2.7.0

 Attachments: 0001-YARN-933.patch, 0001-YARN-933.patch, 
 0004-YARN-933.patch, YARN-933.3.patch, YARN-933.patch


 am max retries configured as 3 at client and RM side.
 Step 1: Install cluster with NM on 2 Machines 
 Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But 
 using Hostname should fail
 Step 3: Execute a job
 Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , 
 connection loss happened.
 Observation :
 ==
 After AppAttempt_1 has moved to failed state ,release of container for 
 AppAttempt_1 and Application removal are successful. New AppAttempt_2 is 
 sponed.
 1. Then again retry for AppAttempt_1 happens.
 2. Again RM side it is trying to launch AppAttempt_1, hence fails with 
 InvalidStateTransitonException
 3. Client got exited after AppAttempt_1 is been finished [But actually job is 
 still running ], while the appattempts configured is 3 and rest appattempts 
 are all sponed and running.
 RMLogs:
 ==
 2013-07-17 16:22:51,013 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1373952096466_0056_01 State change from SCHEDULED to ALLOCATED
 2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); 
 maxRetries=45
 2013-07-17 16:36:07,091 INFO 
 org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: 
 Expired:container_1373952096466_0056_01_01 Timed out after 600 secs
 2013-07-17 16:36:07,093 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 container_1373952096466_0056_01_01 Container Transitioned from ACQUIRED 
 to EXPIRED
 2013-07-17 16:36:07,093 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
 Registering appattempt_1373952096466_0056_02
 2013-07-17 16:36:07,131 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  Application appattempt_1373952096466_0056_01 is done. finalState=FAILED
 2013-07-17 16:36:07,131 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
 Application removed - appId: application_1373952096466_0056 user: Rex 
 leaf-queue of parent: root #applications: 35
 2013-07-17 16:36:07,132 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  Application Submission: appattempt_1373952096466_0056_02, 
 2013-07-17 16:36:07,138 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1373952096466_0056_02 State change from SUBMITTED to SCHEDULED
 2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 38 time(s); 
 maxRetries=45
 2013-07-17 16:38:36,203 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 44 time(s); 
 maxRetries=45
 2013-07-17 16:38:56,207 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error 
 launching appattempt_1373952096466_0056_01. Got exception: 
 java.lang.reflect.UndeclaredThrowableException
 2013-07-17 16:38:56,207 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 LAUNCH_FAILED at FAILED
  at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)

[jira] [Commented] (YARN-933) Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING

2015-02-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328348#comment-14328348
 ] 

Hudson commented on YARN-933:
-

FAILURE: Integrated in Hadoop-trunk-Commit #7158 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7158/])
YARN-933. Fixed InvalidStateTransitonException at FINAL_SAVING state in RMApp. 
Contributed by Rohith Sharmaks (jianhe: rev 
c0d9b93953767608dfe429ddb9bd4c1c3bd3debf)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java


 Potential InvalidStateTransitonException: Invalid event: LAUNCHED at 
 FINAL_SAVING
 -

 Key: YARN-933
 URL: https://issues.apache.org/jira/browse/YARN-933
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.5-alpha
Reporter: J.Andreina
Assignee: Rohith
 Fix For: 2.7.0

 Attachments: 0001-YARN-933.patch, 0001-YARN-933.patch, 
 0004-YARN-933.patch, YARN-933.3.patch, YARN-933.patch


 am max retries configured as 3 at client and RM side.
 Step 1: Install cluster with NM on 2 Machines 
 Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But 
 using Hostname should fail
 Step 3: Execute a job
 Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , 
 connection loss happened.
 Observation :
 ==
 After AppAttempt_1 has moved to failed state ,release of container for 
 AppAttempt_1 and Application removal are successful. New AppAttempt_2 is 
 sponed.
 1. Then again retry for AppAttempt_1 happens.
 2. Again RM side it is trying to launch AppAttempt_1, hence fails with 
 InvalidStateTransitonException
 3. Client got exited after AppAttempt_1 is been finished [But actually job is 
 still running ], while the appattempts configured is 3 and rest appattempts 
 are all sponed and running.
 RMLogs:
 ==
 2013-07-17 16:22:51,013 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1373952096466_0056_01 State change from SCHEDULED to ALLOCATED
 2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); 
 maxRetries=45
 2013-07-17 16:36:07,091 INFO 
 org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: 
 Expired:container_1373952096466_0056_01_01 Timed out after 600 secs
 2013-07-17 16:36:07,093 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 container_1373952096466_0056_01_01 Container Transitioned from ACQUIRED 
 to EXPIRED
 2013-07-17 16:36:07,093 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
 Registering appattempt_1373952096466_0056_02
 2013-07-17 16:36:07,131 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  Application appattempt_1373952096466_0056_01 is done. finalState=FAILED
 2013-07-17 16:36:07,131 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
 Application removed - appId: application_1373952096466_0056 user: Rex 
 leaf-queue of parent: root #applications: 35
 2013-07-17 16:36:07,132 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  Application Submission: appattempt_1373952096466_0056_02, 
 2013-07-17 16:36:07,138 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1373952096466_0056_02 State change from SUBMITTED to SCHEDULED
 2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 38 time(s); 
 maxRetries=45
 2013-07-17 16:38:36,203 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 44 time(s); 
 maxRetries=45
 2013-07-17 16:38:56,207 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error 
 launching appattempt_1373952096466_0056_01. Got exception: 
 java.lang.reflect.UndeclaredThrowableException
 2013-07-17 16:38:56,207 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 LAUNCH_FAILED at FAILED
  at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
  at 

[jira] [Commented] (YARN-933) Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING

2015-02-18 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326235#comment-14326235
 ] 

Rohith commented on YARN-933:
-

[~jianhe] kindly review the updated patch.

 Potential InvalidStateTransitonException: Invalid event: LAUNCHED at 
 FINAL_SAVING
 -

 Key: YARN-933
 URL: https://issues.apache.org/jira/browse/YARN-933
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.5-alpha
Reporter: J.Andreina
Assignee: Rohith
 Attachments: 0001-YARN-933.patch, 0001-YARN-933.patch, 
 0004-YARN-933.patch, YARN-933.3.patch, YARN-933.patch


 am max retries configured as 3 at client and RM side.
 Step 1: Install cluster with NM on 2 Machines 
 Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But 
 using Hostname should fail
 Step 3: Execute a job
 Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , 
 connection loss happened.
 Observation :
 ==
 After AppAttempt_1 has moved to failed state ,release of container for 
 AppAttempt_1 and Application removal are successful. New AppAttempt_2 is 
 sponed.
 1. Then again retry for AppAttempt_1 happens.
 2. Again RM side it is trying to launch AppAttempt_1, hence fails with 
 InvalidStateTransitonException
 3. Client got exited after AppAttempt_1 is been finished [But actually job is 
 still running ], while the appattempts configured is 3 and rest appattempts 
 are all sponed and running.
 RMLogs:
 ==
 2013-07-17 16:22:51,013 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1373952096466_0056_01 State change from SCHEDULED to ALLOCATED
 2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); 
 maxRetries=45
 2013-07-17 16:36:07,091 INFO 
 org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: 
 Expired:container_1373952096466_0056_01_01 Timed out after 600 secs
 2013-07-17 16:36:07,093 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 container_1373952096466_0056_01_01 Container Transitioned from ACQUIRED 
 to EXPIRED
 2013-07-17 16:36:07,093 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
 Registering appattempt_1373952096466_0056_02
 2013-07-17 16:36:07,131 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  Application appattempt_1373952096466_0056_01 is done. finalState=FAILED
 2013-07-17 16:36:07,131 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
 Application removed - appId: application_1373952096466_0056 user: Rex 
 leaf-queue of parent: root #applications: 35
 2013-07-17 16:36:07,132 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  Application Submission: appattempt_1373952096466_0056_02, 
 2013-07-17 16:36:07,138 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1373952096466_0056_02 State change from SUBMITTED to SCHEDULED
 2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 38 time(s); 
 maxRetries=45
 2013-07-17 16:38:36,203 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 44 time(s); 
 maxRetries=45
 2013-07-17 16:38:56,207 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error 
 launching appattempt_1373952096466_0056_01. Got exception: 
 java.lang.reflect.UndeclaredThrowableException
 2013-07-17 16:38:56,207 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 LAUNCH_FAILED at FAILED
  at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
  at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
  at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:630)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:99)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:495)
  at 
 

[jira] [Commented] (YARN-933) Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING

2015-02-18 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326557#comment-14326557
 ] 

Jian He commented on YARN-933:
--

lgtm, +1

 Potential InvalidStateTransitonException: Invalid event: LAUNCHED at 
 FINAL_SAVING
 -

 Key: YARN-933
 URL: https://issues.apache.org/jira/browse/YARN-933
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.5-alpha
Reporter: J.Andreina
Assignee: Rohith
 Attachments: 0001-YARN-933.patch, 0001-YARN-933.patch, 
 0004-YARN-933.patch, YARN-933.3.patch, YARN-933.patch


 am max retries configured as 3 at client and RM side.
 Step 1: Install cluster with NM on 2 Machines 
 Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But 
 using Hostname should fail
 Step 3: Execute a job
 Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , 
 connection loss happened.
 Observation :
 ==
 After AppAttempt_1 has moved to failed state ,release of container for 
 AppAttempt_1 and Application removal are successful. New AppAttempt_2 is 
 sponed.
 1. Then again retry for AppAttempt_1 happens.
 2. Again RM side it is trying to launch AppAttempt_1, hence fails with 
 InvalidStateTransitonException
 3. Client got exited after AppAttempt_1 is been finished [But actually job is 
 still running ], while the appattempts configured is 3 and rest appattempts 
 are all sponed and running.
 RMLogs:
 ==
 2013-07-17 16:22:51,013 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1373952096466_0056_01 State change from SCHEDULED to ALLOCATED
 2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); 
 maxRetries=45
 2013-07-17 16:36:07,091 INFO 
 org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: 
 Expired:container_1373952096466_0056_01_01 Timed out after 600 secs
 2013-07-17 16:36:07,093 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 container_1373952096466_0056_01_01 Container Transitioned from ACQUIRED 
 to EXPIRED
 2013-07-17 16:36:07,093 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
 Registering appattempt_1373952096466_0056_02
 2013-07-17 16:36:07,131 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  Application appattempt_1373952096466_0056_01 is done. finalState=FAILED
 2013-07-17 16:36:07,131 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
 Application removed - appId: application_1373952096466_0056 user: Rex 
 leaf-queue of parent: root #applications: 35
 2013-07-17 16:36:07,132 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  Application Submission: appattempt_1373952096466_0056_02, 
 2013-07-17 16:36:07,138 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1373952096466_0056_02 State change from SUBMITTED to SCHEDULED
 2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 38 time(s); 
 maxRetries=45
 2013-07-17 16:38:36,203 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 44 time(s); 
 maxRetries=45
 2013-07-17 16:38:56,207 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error 
 launching appattempt_1373952096466_0056_01. Got exception: 
 java.lang.reflect.UndeclaredThrowableException
 2013-07-17 16:38:56,207 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 LAUNCH_FAILED at FAILED
  at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
  at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
  at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:630)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:99)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:495)
  at 
 

[jira] [Commented] (YARN-933) Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING

2015-02-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313834#comment-14313834
 ] 

Hadoop QA commented on YARN-933:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12697671/0004-YARN-933.patch
  against trunk revision b73956f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6573//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6573//console

This message is automatically generated.

 Potential InvalidStateTransitonException: Invalid event: LAUNCHED at 
 FINAL_SAVING
 -

 Key: YARN-933
 URL: https://issues.apache.org/jira/browse/YARN-933
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.5-alpha
Reporter: J.Andreina
Assignee: Rohith
 Attachments: 0001-YARN-933.patch, 0001-YARN-933.patch, 
 0004-YARN-933.patch, YARN-933.3.patch, YARN-933.patch


 am max retries configured as 3 at client and RM side.
 Step 1: Install cluster with NM on 2 Machines 
 Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But 
 using Hostname should fail
 Step 3: Execute a job
 Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , 
 connection loss happened.
 Observation :
 ==
 After AppAttempt_1 has moved to failed state ,release of container for 
 AppAttempt_1 and Application removal are successful. New AppAttempt_2 is 
 sponed.
 1. Then again retry for AppAttempt_1 happens.
 2. Again RM side it is trying to launch AppAttempt_1, hence fails with 
 InvalidStateTransitonException
 3. Client got exited after AppAttempt_1 is been finished [But actually job is 
 still running ], while the appattempts configured is 3 and rest appattempts 
 are all sponed and running.
 RMLogs:
 ==
 2013-07-17 16:22:51,013 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1373952096466_0056_01 State change from SCHEDULED to ALLOCATED
 2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); 
 maxRetries=45
 2013-07-17 16:36:07,091 INFO 
 org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: 
 Expired:container_1373952096466_0056_01_01 Timed out after 600 secs
 2013-07-17 16:36:07,093 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 container_1373952096466_0056_01_01 Container Transitioned from ACQUIRED 
 to EXPIRED
 2013-07-17 16:36:07,093 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
 Registering appattempt_1373952096466_0056_02
 2013-07-17 16:36:07,131 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  Application appattempt_1373952096466_0056_01 is done. finalState=FAILED
 2013-07-17 16:36:07,131 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
 Application removed - appId: application_1373952096466_0056 user: Rex 
 leaf-queue of parent: root #applications: 35
 2013-07-17 16:36:07,132 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  Application Submission: appattempt_1373952096466_0056_02, 
 2013-07-17 16:36:07,138 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1373952096466_0056_02 State change from SUBMITTED to SCHEDULED
 2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 38 time(s); 
 maxRetries=45
 2013-07-17 16:38:36,203 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 44 time(s); 
 maxRetries=45
 2013-07-17 

[jira] [Commented] (YARN-933) Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING

2015-02-09 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313727#comment-14313727
 ] 

Rohith commented on YARN-933:
-

thanks [~jianhe] for updating changed patch:-) Just to be flow consistency in 
test for RMAppAttemptImpl state i.e 
new-submitted-scheduled-allocated_saving-allocated--kill event- 
final_saving-killed I'd like to add {{allocateApplicationAttempt()}}. 
Updated the patch for doing verification at killed state.

 Potential InvalidStateTransitonException: Invalid event: LAUNCHED at 
 FINAL_SAVING
 -

 Key: YARN-933
 URL: https://issues.apache.org/jira/browse/YARN-933
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.5-alpha
Reporter: J.Andreina
Assignee: Rohith
 Attachments: 0001-YARN-933.patch, 0001-YARN-933.patch, 
 0004-YARN-933.patch, YARN-933.3.patch, YARN-933.patch


 am max retries configured as 3 at client and RM side.
 Step 1: Install cluster with NM on 2 Machines 
 Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But 
 using Hostname should fail
 Step 3: Execute a job
 Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , 
 connection loss happened.
 Observation :
 ==
 After AppAttempt_1 has moved to failed state ,release of container for 
 AppAttempt_1 and Application removal are successful. New AppAttempt_2 is 
 sponed.
 1. Then again retry for AppAttempt_1 happens.
 2. Again RM side it is trying to launch AppAttempt_1, hence fails with 
 InvalidStateTransitonException
 3. Client got exited after AppAttempt_1 is been finished [But actually job is 
 still running ], while the appattempts configured is 3 and rest appattempts 
 are all sponed and running.
 RMLogs:
 ==
 2013-07-17 16:22:51,013 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1373952096466_0056_01 State change from SCHEDULED to ALLOCATED
 2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); 
 maxRetries=45
 2013-07-17 16:36:07,091 INFO 
 org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: 
 Expired:container_1373952096466_0056_01_01 Timed out after 600 secs
 2013-07-17 16:36:07,093 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 container_1373952096466_0056_01_01 Container Transitioned from ACQUIRED 
 to EXPIRED
 2013-07-17 16:36:07,093 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
 Registering appattempt_1373952096466_0056_02
 2013-07-17 16:36:07,131 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  Application appattempt_1373952096466_0056_01 is done. finalState=FAILED
 2013-07-17 16:36:07,131 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
 Application removed - appId: application_1373952096466_0056 user: Rex 
 leaf-queue of parent: root #applications: 35
 2013-07-17 16:36:07,132 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  Application Submission: appattempt_1373952096466_0056_02, 
 2013-07-17 16:36:07,138 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1373952096466_0056_02 State change from SUBMITTED to SCHEDULED
 2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 38 time(s); 
 maxRetries=45
 2013-07-17 16:38:36,203 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 44 time(s); 
 maxRetries=45
 2013-07-17 16:38:56,207 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error 
 launching appattempt_1373952096466_0056_01. Got exception: 
 java.lang.reflect.UndeclaredThrowableException
 2013-07-17 16:38:56,207 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 LAUNCH_FAILED at FAILED
  at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
  at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
  at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:630)
  at 
 

[jira] [Commented] (YARN-933) Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING

2015-02-09 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313286#comment-14313286
 ] 

Jian He commented on YARN-933:
--

bq. You should not ignore RMAppAttemptEventType.LAUNCHED? We will have to 
explicitly kill the AppAttempt and the AM in this case
The AM here is being killed. Allocated state gets the kill event and kill the 
AM(send the clean up event to the AM launcher) and then moves to the 
final_saving state.  

 Potential InvalidStateTransitonException: Invalid event: LAUNCHED at 
 FINAL_SAVING
 -

 Key: YARN-933
 URL: https://issues.apache.org/jira/browse/YARN-933
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.5-alpha
Reporter: J.Andreina
Assignee: Rohith
 Attachments: 0001-YARN-933.patch, 0001-YARN-933.patch, 
 YARN-933.3.patch, YARN-933.patch


 am max retries configured as 3 at client and RM side.
 Step 1: Install cluster with NM on 2 Machines 
 Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But 
 using Hostname should fail
 Step 3: Execute a job
 Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , 
 connection loss happened.
 Observation :
 ==
 After AppAttempt_1 has moved to failed state ,release of container for 
 AppAttempt_1 and Application removal are successful. New AppAttempt_2 is 
 sponed.
 1. Then again retry for AppAttempt_1 happens.
 2. Again RM side it is trying to launch AppAttempt_1, hence fails with 
 InvalidStateTransitonException
 3. Client got exited after AppAttempt_1 is been finished [But actually job is 
 still running ], while the appattempts configured is 3 and rest appattempts 
 are all sponed and running.
 RMLogs:
 ==
 2013-07-17 16:22:51,013 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1373952096466_0056_01 State change from SCHEDULED to ALLOCATED
 2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); 
 maxRetries=45
 2013-07-17 16:36:07,091 INFO 
 org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: 
 Expired:container_1373952096466_0056_01_01 Timed out after 600 secs
 2013-07-17 16:36:07,093 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 container_1373952096466_0056_01_01 Container Transitioned from ACQUIRED 
 to EXPIRED
 2013-07-17 16:36:07,093 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
 Registering appattempt_1373952096466_0056_02
 2013-07-17 16:36:07,131 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  Application appattempt_1373952096466_0056_01 is done. finalState=FAILED
 2013-07-17 16:36:07,131 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
 Application removed - appId: application_1373952096466_0056 user: Rex 
 leaf-queue of parent: root #applications: 35
 2013-07-17 16:36:07,132 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  Application Submission: appattempt_1373952096466_0056_02, 
 2013-07-17 16:36:07,138 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1373952096466_0056_02 State change from SUBMITTED to SCHEDULED
 2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 38 time(s); 
 maxRetries=45
 2013-07-17 16:38:36,203 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 44 time(s); 
 maxRetries=45
 2013-07-17 16:38:56,207 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error 
 launching appattempt_1373952096466_0056_01. Got exception: 
 java.lang.reflect.UndeclaredThrowableException
 2013-07-17 16:38:56,207 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 LAUNCH_FAILED at FAILED
  at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
  at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
  at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:630)
  at 
 

[jira] [Commented] (YARN-933) Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING

2015-02-09 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14312926#comment-14312926
 ] 

Jian He commented on YARN-933:
--

the variable {{Container amContainer = allocateApplicationAttempt();}} is not 
used. 
just updating the patch myself

 Potential InvalidStateTransitonException: Invalid event: LAUNCHED at 
 FINAL_SAVING
 -

 Key: YARN-933
 URL: https://issues.apache.org/jira/browse/YARN-933
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.5-alpha
Reporter: J.Andreina
Assignee: Rohith
 Attachments: 0001-YARN-933.patch, 0001-YARN-933.patch, YARN-933.patch


 am max retries configured as 3 at client and RM side.
 Step 1: Install cluster with NM on 2 Machines 
 Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But 
 using Hostname should fail
 Step 3: Execute a job
 Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , 
 connection loss happened.
 Observation :
 ==
 After AppAttempt_1 has moved to failed state ,release of container for 
 AppAttempt_1 and Application removal are successful. New AppAttempt_2 is 
 sponed.
 1. Then again retry for AppAttempt_1 happens.
 2. Again RM side it is trying to launch AppAttempt_1, hence fails with 
 InvalidStateTransitonException
 3. Client got exited after AppAttempt_1 is been finished [But actually job is 
 still running ], while the appattempts configured is 3 and rest appattempts 
 are all sponed and running.
 RMLogs:
 ==
 2013-07-17 16:22:51,013 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1373952096466_0056_01 State change from SCHEDULED to ALLOCATED
 2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); 
 maxRetries=45
 2013-07-17 16:36:07,091 INFO 
 org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: 
 Expired:container_1373952096466_0056_01_01 Timed out after 600 secs
 2013-07-17 16:36:07,093 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 container_1373952096466_0056_01_01 Container Transitioned from ACQUIRED 
 to EXPIRED
 2013-07-17 16:36:07,093 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
 Registering appattempt_1373952096466_0056_02
 2013-07-17 16:36:07,131 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  Application appattempt_1373952096466_0056_01 is done. finalState=FAILED
 2013-07-17 16:36:07,131 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
 Application removed - appId: application_1373952096466_0056 user: Rex 
 leaf-queue of parent: root #applications: 35
 2013-07-17 16:36:07,132 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  Application Submission: appattempt_1373952096466_0056_02, 
 2013-07-17 16:36:07,138 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1373952096466_0056_02 State change from SUBMITTED to SCHEDULED
 2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 38 time(s); 
 maxRetries=45
 2013-07-17 16:38:36,203 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 44 time(s); 
 maxRetries=45
 2013-07-17 16:38:56,207 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error 
 launching appattempt_1373952096466_0056_01. Got exception: 
 java.lang.reflect.UndeclaredThrowableException
 2013-07-17 16:38:56,207 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 LAUNCH_FAILED at FAILED
  at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
  at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
  at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:630)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:99)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:495)
  at 
 

[jira] [Commented] (YARN-933) Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING

2015-02-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313010#comment-14313010
 ] 

Hadoop QA commented on YARN-933:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12697404/0001-YARN-933.patch
  against trunk revision fcad031.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6555//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6555//console

This message is automatically generated.

 Potential InvalidStateTransitonException: Invalid event: LAUNCHED at 
 FINAL_SAVING
 -

 Key: YARN-933
 URL: https://issues.apache.org/jira/browse/YARN-933
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.5-alpha
Reporter: J.Andreina
Assignee: Rohith
 Attachments: 0001-YARN-933.patch, 0001-YARN-933.patch, 
 YARN-933.3.patch, YARN-933.patch


 am max retries configured as 3 at client and RM side.
 Step 1: Install cluster with NM on 2 Machines 
 Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But 
 using Hostname should fail
 Step 3: Execute a job
 Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , 
 connection loss happened.
 Observation :
 ==
 After AppAttempt_1 has moved to failed state ,release of container for 
 AppAttempt_1 and Application removal are successful. New AppAttempt_2 is 
 sponed.
 1. Then again retry for AppAttempt_1 happens.
 2. Again RM side it is trying to launch AppAttempt_1, hence fails with 
 InvalidStateTransitonException
 3. Client got exited after AppAttempt_1 is been finished [But actually job is 
 still running ], while the appattempts configured is 3 and rest appattempts 
 are all sponed and running.
 RMLogs:
 ==
 2013-07-17 16:22:51,013 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1373952096466_0056_01 State change from SCHEDULED to ALLOCATED
 2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); 
 maxRetries=45
 2013-07-17 16:36:07,091 INFO 
 org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: 
 Expired:container_1373952096466_0056_01_01 Timed out after 600 secs
 2013-07-17 16:36:07,093 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 container_1373952096466_0056_01_01 Container Transitioned from ACQUIRED 
 to EXPIRED
 2013-07-17 16:36:07,093 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
 Registering appattempt_1373952096466_0056_02
 2013-07-17 16:36:07,131 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  Application appattempt_1373952096466_0056_01 is done. finalState=FAILED
 2013-07-17 16:36:07,131 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
 Application removed - appId: application_1373952096466_0056 user: Rex 
 leaf-queue of parent: root #applications: 35
 2013-07-17 16:36:07,132 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  Application Submission: appattempt_1373952096466_0056_02, 
 2013-07-17 16:36:07,138 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1373952096466_0056_02 State change from SUBMITTED to SCHEDULED
 2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 38 time(s); 
 maxRetries=45
 2013-07-17 16:38:36,203 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 44 time(s); 
 maxRetries=45
 2013-07-17 16:38:56,207 INFO 
 

[jira] [Commented] (YARN-933) Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING

2015-02-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313227#comment-14313227
 ] 

Hadoop QA commented on YARN-933:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12697566/YARN-933.3.patch
  against trunk revision aab459c.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 13 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core:

  org.apache.hadoop.mapreduce.lib.input.TestLineRecordReader
  org.apache.hadoop.mapred.TestLineRecordReader

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6560//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6560//artifact/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6560//console

This message is automatically generated.

 Potential InvalidStateTransitonException: Invalid event: LAUNCHED at 
 FINAL_SAVING
 -

 Key: YARN-933
 URL: https://issues.apache.org/jira/browse/YARN-933
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.5-alpha
Reporter: J.Andreina
Assignee: Rohith
 Attachments: 0001-YARN-933.patch, 0001-YARN-933.patch, 
 YARN-933.3.patch, YARN-933.patch


 am max retries configured as 3 at client and RM side.
 Step 1: Install cluster with NM on 2 Machines 
 Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But 
 using Hostname should fail
 Step 3: Execute a job
 Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , 
 connection loss happened.
 Observation :
 ==
 After AppAttempt_1 has moved to failed state ,release of container for 
 AppAttempt_1 and Application removal are successful. New AppAttempt_2 is 
 sponed.
 1. Then again retry for AppAttempt_1 happens.
 2. Again RM side it is trying to launch AppAttempt_1, hence fails with 
 InvalidStateTransitonException
 3. Client got exited after AppAttempt_1 is been finished [But actually job is 
 still running ], while the appattempts configured is 3 and rest appattempts 
 are all sponed and running.
 RMLogs:
 ==
 2013-07-17 16:22:51,013 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1373952096466_0056_01 State change from SCHEDULED to ALLOCATED
 2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); 
 maxRetries=45
 2013-07-17 16:36:07,091 INFO 
 org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: 
 Expired:container_1373952096466_0056_01_01 Timed out after 600 secs
 2013-07-17 16:36:07,093 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 container_1373952096466_0056_01_01 Container Transitioned from ACQUIRED 
 to EXPIRED
 2013-07-17 16:36:07,093 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
 Registering appattempt_1373952096466_0056_02
 2013-07-17 16:36:07,131 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  Application appattempt_1373952096466_0056_01 is done. finalState=FAILED
 2013-07-17 16:36:07,131 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
 Application removed - appId: application_1373952096466_0056 user: Rex 
 leaf-queue of parent: root #applications: 35
 2013-07-17 16:36:07,132 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  Application Submission: appattempt_1373952096466_0056_02, 
 2013-07-17 16:36:07,138 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1373952096466_0056_02 State change from SUBMITTED to SCHEDULED
 2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to 

[jira] [Commented] (YARN-933) Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING

2015-02-09 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313248#comment-14313248
 ] 

Vinod Kumar Vavilapalli commented on YARN-933:
--

You should not ignore RMAppAttemptEventType.LAUNCHED? We will have to 
explicitly kill the AppAttempt and the AM in this case, right?

 Potential InvalidStateTransitonException: Invalid event: LAUNCHED at 
 FINAL_SAVING
 -

 Key: YARN-933
 URL: https://issues.apache.org/jira/browse/YARN-933
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.5-alpha
Reporter: J.Andreina
Assignee: Rohith
 Attachments: 0001-YARN-933.patch, 0001-YARN-933.patch, 
 YARN-933.3.patch, YARN-933.patch


 am max retries configured as 3 at client and RM side.
 Step 1: Install cluster with NM on 2 Machines 
 Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But 
 using Hostname should fail
 Step 3: Execute a job
 Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , 
 connection loss happened.
 Observation :
 ==
 After AppAttempt_1 has moved to failed state ,release of container for 
 AppAttempt_1 and Application removal are successful. New AppAttempt_2 is 
 sponed.
 1. Then again retry for AppAttempt_1 happens.
 2. Again RM side it is trying to launch AppAttempt_1, hence fails with 
 InvalidStateTransitonException
 3. Client got exited after AppAttempt_1 is been finished [But actually job is 
 still running ], while the appattempts configured is 3 and rest appattempts 
 are all sponed and running.
 RMLogs:
 ==
 2013-07-17 16:22:51,013 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1373952096466_0056_01 State change from SCHEDULED to ALLOCATED
 2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); 
 maxRetries=45
 2013-07-17 16:36:07,091 INFO 
 org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: 
 Expired:container_1373952096466_0056_01_01 Timed out after 600 secs
 2013-07-17 16:36:07,093 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 container_1373952096466_0056_01_01 Container Transitioned from ACQUIRED 
 to EXPIRED
 2013-07-17 16:36:07,093 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
 Registering appattempt_1373952096466_0056_02
 2013-07-17 16:36:07,131 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  Application appattempt_1373952096466_0056_01 is done. finalState=FAILED
 2013-07-17 16:36:07,131 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
 Application removed - appId: application_1373952096466_0056 user: Rex 
 leaf-queue of parent: root #applications: 35
 2013-07-17 16:36:07,132 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  Application Submission: appattempt_1373952096466_0056_02, 
 2013-07-17 16:36:07,138 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1373952096466_0056_02 State change from SUBMITTED to SCHEDULED
 2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 38 time(s); 
 maxRetries=45
 2013-07-17 16:38:36,203 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 44 time(s); 
 maxRetries=45
 2013-07-17 16:38:56,207 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error 
 launching appattempt_1373952096466_0056_01. Got exception: 
 java.lang.reflect.UndeclaredThrowableException
 2013-07-17 16:38:56,207 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 LAUNCH_FAILED at FAILED
  at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
  at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
  at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:630)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:99)
  at