[jira] [Commented] (YARN-933) Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING
[ https://issues.apache.org/jira/browse/YARN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328848#comment-14328848 ] Hudson commented on YARN-933: - SUCCESS: Integrated in Hadoop-Yarn-trunk #844 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/844/]) YARN-933. Fixed InvalidStateTransitonException at FINAL_SAVING state in RMApp. Contributed by Rohith Sharmaks (jianhe: rev c0d9b93953767608dfe429ddb9bd4c1c3bd3debf) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java * hadoop-yarn-project/CHANGES.txt Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING - Key: YARN-933 URL: https://issues.apache.org/jira/browse/YARN-933 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.5-alpha Reporter: J.Andreina Assignee: Rohith Fix For: 2.7.0 Attachments: 0001-YARN-933.patch, 0001-YARN-933.patch, 0004-YARN-933.patch, YARN-933.3.patch, YARN-933.patch am max retries configured as 3 at client and RM side. Step 1: Install cluster with NM on 2 Machines Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But using Hostname should fail Step 3: Execute a job Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , connection loss happened. Observation : == After AppAttempt_1 has moved to failed state ,release of container for AppAttempt_1 and Application removal are successful. New AppAttempt_2 is sponed. 1. Then again retry for AppAttempt_1 happens. 2. Again RM side it is trying to launch AppAttempt_1, hence fails with InvalidStateTransitonException 3. Client got exited after AppAttempt_1 is been finished [But actually job is still running ], while the appattempts configured is 3 and rest appattempts are all sponed and running. RMLogs: == 2013-07-17 16:22:51,013 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_01 State change from SCHEDULED to ALLOCATED 2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); maxRetries=45 2013-07-17 16:36:07,091 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: Expired:container_1373952096466_0056_01_01 Timed out after 600 secs 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1373952096466_0056_01_01 Container Transitioned from ACQUIRED to EXPIRED 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Registering appattempt_1373952096466_0056_02 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application appattempt_1373952096466_0056_01 is done. finalState=FAILED 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Application removed - appId: application_1373952096466_0056 user: Rex leaf-queue of parent: root #applications: 35 2013-07-17 16:36:07,132 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application Submission: appattempt_1373952096466_0056_02, 2013-07-17 16:36:07,138 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_02 State change from SUBMITTED to SCHEDULED 2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 38 time(s); maxRetries=45 2013-07-17 16:38:36,203 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 44 time(s); maxRetries=45 2013-07-17 16:38:56,207 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error launching appattempt_1373952096466_0056_01. Got exception: java.lang.reflect.UndeclaredThrowableException 2013-07-17 16:38:56,207 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: LAUNCH_FAILED at FAILED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at
[jira] [Commented] (YARN-933) Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING
[ https://issues.apache.org/jira/browse/YARN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328826#comment-14328826 ] Hudson commented on YARN-933: - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #110 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/110/]) YARN-933. Fixed InvalidStateTransitonException at FINAL_SAVING state in RMApp. Contributed by Rohith Sharmaks (jianhe: rev c0d9b93953767608dfe429ddb9bd4c1c3bd3debf) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java * hadoop-yarn-project/CHANGES.txt Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING - Key: YARN-933 URL: https://issues.apache.org/jira/browse/YARN-933 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.5-alpha Reporter: J.Andreina Assignee: Rohith Fix For: 2.7.0 Attachments: 0001-YARN-933.patch, 0001-YARN-933.patch, 0004-YARN-933.patch, YARN-933.3.patch, YARN-933.patch am max retries configured as 3 at client and RM side. Step 1: Install cluster with NM on 2 Machines Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But using Hostname should fail Step 3: Execute a job Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , connection loss happened. Observation : == After AppAttempt_1 has moved to failed state ,release of container for AppAttempt_1 and Application removal are successful. New AppAttempt_2 is sponed. 1. Then again retry for AppAttempt_1 happens. 2. Again RM side it is trying to launch AppAttempt_1, hence fails with InvalidStateTransitonException 3. Client got exited after AppAttempt_1 is been finished [But actually job is still running ], while the appattempts configured is 3 and rest appattempts are all sponed and running. RMLogs: == 2013-07-17 16:22:51,013 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_01 State change from SCHEDULED to ALLOCATED 2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); maxRetries=45 2013-07-17 16:36:07,091 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: Expired:container_1373952096466_0056_01_01 Timed out after 600 secs 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1373952096466_0056_01_01 Container Transitioned from ACQUIRED to EXPIRED 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Registering appattempt_1373952096466_0056_02 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application appattempt_1373952096466_0056_01 is done. finalState=FAILED 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Application removed - appId: application_1373952096466_0056 user: Rex leaf-queue of parent: root #applications: 35 2013-07-17 16:36:07,132 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application Submission: appattempt_1373952096466_0056_02, 2013-07-17 16:36:07,138 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_02 State change from SUBMITTED to SCHEDULED 2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 38 time(s); maxRetries=45 2013-07-17 16:38:36,203 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 44 time(s); maxRetries=45 2013-07-17 16:38:56,207 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error launching appattempt_1373952096466_0056_01. Got exception: java.lang.reflect.UndeclaredThrowableException 2013-07-17 16:38:56,207 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: LAUNCH_FAILED at FAILED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
[jira] [Commented] (YARN-933) Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING
[ https://issues.apache.org/jira/browse/YARN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328948#comment-14328948 ] Hudson commented on YARN-933: - FAILURE: Integrated in Hadoop-Hdfs-trunk #2042 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2042/]) YARN-933. Fixed InvalidStateTransitonException at FINAL_SAVING state in RMApp. Contributed by Rohith Sharmaks (jianhe: rev c0d9b93953767608dfe429ddb9bd4c1c3bd3debf) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * hadoop-yarn-project/CHANGES.txt Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING - Key: YARN-933 URL: https://issues.apache.org/jira/browse/YARN-933 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.5-alpha Reporter: J.Andreina Assignee: Rohith Fix For: 2.7.0 Attachments: 0001-YARN-933.patch, 0001-YARN-933.patch, 0004-YARN-933.patch, YARN-933.3.patch, YARN-933.patch am max retries configured as 3 at client and RM side. Step 1: Install cluster with NM on 2 Machines Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But using Hostname should fail Step 3: Execute a job Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , connection loss happened. Observation : == After AppAttempt_1 has moved to failed state ,release of container for AppAttempt_1 and Application removal are successful. New AppAttempt_2 is sponed. 1. Then again retry for AppAttempt_1 happens. 2. Again RM side it is trying to launch AppAttempt_1, hence fails with InvalidStateTransitonException 3. Client got exited after AppAttempt_1 is been finished [But actually job is still running ], while the appattempts configured is 3 and rest appattempts are all sponed and running. RMLogs: == 2013-07-17 16:22:51,013 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_01 State change from SCHEDULED to ALLOCATED 2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); maxRetries=45 2013-07-17 16:36:07,091 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: Expired:container_1373952096466_0056_01_01 Timed out after 600 secs 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1373952096466_0056_01_01 Container Transitioned from ACQUIRED to EXPIRED 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Registering appattempt_1373952096466_0056_02 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application appattempt_1373952096466_0056_01 is done. finalState=FAILED 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Application removed - appId: application_1373952096466_0056 user: Rex leaf-queue of parent: root #applications: 35 2013-07-17 16:36:07,132 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application Submission: appattempt_1373952096466_0056_02, 2013-07-17 16:36:07,138 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_02 State change from SUBMITTED to SCHEDULED 2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 38 time(s); maxRetries=45 2013-07-17 16:38:36,203 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 44 time(s); maxRetries=45 2013-07-17 16:38:56,207 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error launching appattempt_1373952096466_0056_01. Got exception: java.lang.reflect.UndeclaredThrowableException 2013-07-17 16:38:56,207 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: LAUNCH_FAILED at FAILED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at
[jira] [Commented] (YARN-933) Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING
[ https://issues.apache.org/jira/browse/YARN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328954#comment-14328954 ] Hudson commented on YARN-933: - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #101 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/101/]) YARN-933. Fixed InvalidStateTransitonException at FINAL_SAVING state in RMApp. Contributed by Rohith Sharmaks (jianhe: rev c0d9b93953767608dfe429ddb9bd4c1c3bd3debf) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING - Key: YARN-933 URL: https://issues.apache.org/jira/browse/YARN-933 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.5-alpha Reporter: J.Andreina Assignee: Rohith Fix For: 2.7.0 Attachments: 0001-YARN-933.patch, 0001-YARN-933.patch, 0004-YARN-933.patch, YARN-933.3.patch, YARN-933.patch am max retries configured as 3 at client and RM side. Step 1: Install cluster with NM on 2 Machines Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But using Hostname should fail Step 3: Execute a job Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , connection loss happened. Observation : == After AppAttempt_1 has moved to failed state ,release of container for AppAttempt_1 and Application removal are successful. New AppAttempt_2 is sponed. 1. Then again retry for AppAttempt_1 happens. 2. Again RM side it is trying to launch AppAttempt_1, hence fails with InvalidStateTransitonException 3. Client got exited after AppAttempt_1 is been finished [But actually job is still running ], while the appattempts configured is 3 and rest appattempts are all sponed and running. RMLogs: == 2013-07-17 16:22:51,013 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_01 State change from SCHEDULED to ALLOCATED 2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); maxRetries=45 2013-07-17 16:36:07,091 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: Expired:container_1373952096466_0056_01_01 Timed out after 600 secs 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1373952096466_0056_01_01 Container Transitioned from ACQUIRED to EXPIRED 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Registering appattempt_1373952096466_0056_02 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application appattempt_1373952096466_0056_01 is done. finalState=FAILED 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Application removed - appId: application_1373952096466_0056 user: Rex leaf-queue of parent: root #applications: 35 2013-07-17 16:36:07,132 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application Submission: appattempt_1373952096466_0056_02, 2013-07-17 16:36:07,138 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_02 State change from SUBMITTED to SCHEDULED 2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 38 time(s); maxRetries=45 2013-07-17 16:38:36,203 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 44 time(s); maxRetries=45 2013-07-17 16:38:56,207 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error launching appattempt_1373952096466_0056_01. Got exception: java.lang.reflect.UndeclaredThrowableException 2013-07-17 16:38:56,207 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: LAUNCH_FAILED at FAILED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
[jira] [Commented] (YARN-933) Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING
[ https://issues.apache.org/jira/browse/YARN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329020#comment-14329020 ] Hudson commented on YARN-933: - FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #111 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/111/]) YARN-933. Fixed InvalidStateTransitonException at FINAL_SAVING state in RMApp. Contributed by Rohith Sharmaks (jianhe: rev c0d9b93953767608dfe429ddb9bd4c1c3bd3debf) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING - Key: YARN-933 URL: https://issues.apache.org/jira/browse/YARN-933 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.5-alpha Reporter: J.Andreina Assignee: Rohith Fix For: 2.7.0 Attachments: 0001-YARN-933.patch, 0001-YARN-933.patch, 0004-YARN-933.patch, YARN-933.3.patch, YARN-933.patch am max retries configured as 3 at client and RM side. Step 1: Install cluster with NM on 2 Machines Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But using Hostname should fail Step 3: Execute a job Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , connection loss happened. Observation : == After AppAttempt_1 has moved to failed state ,release of container for AppAttempt_1 and Application removal are successful. New AppAttempt_2 is sponed. 1. Then again retry for AppAttempt_1 happens. 2. Again RM side it is trying to launch AppAttempt_1, hence fails with InvalidStateTransitonException 3. Client got exited after AppAttempt_1 is been finished [But actually job is still running ], while the appattempts configured is 3 and rest appattempts are all sponed and running. RMLogs: == 2013-07-17 16:22:51,013 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_01 State change from SCHEDULED to ALLOCATED 2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); maxRetries=45 2013-07-17 16:36:07,091 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: Expired:container_1373952096466_0056_01_01 Timed out after 600 secs 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1373952096466_0056_01_01 Container Transitioned from ACQUIRED to EXPIRED 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Registering appattempt_1373952096466_0056_02 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application appattempt_1373952096466_0056_01 is done. finalState=FAILED 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Application removed - appId: application_1373952096466_0056 user: Rex leaf-queue of parent: root #applications: 35 2013-07-17 16:36:07,132 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application Submission: appattempt_1373952096466_0056_02, 2013-07-17 16:36:07,138 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_02 State change from SUBMITTED to SCHEDULED 2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 38 time(s); maxRetries=45 2013-07-17 16:38:36,203 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 44 time(s); maxRetries=45 2013-07-17 16:38:56,207 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error launching appattempt_1373952096466_0056_01. Got exception: java.lang.reflect.UndeclaredThrowableException 2013-07-17 16:38:56,207 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: LAUNCH_FAILED at FAILED at
[jira] [Commented] (YARN-933) Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING
[ https://issues.apache.org/jira/browse/YARN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329050#comment-14329050 ] Hudson commented on YARN-933: - FAILURE: Integrated in Hadoop-Mapreduce-trunk #2061 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2061/]) YARN-933. Fixed InvalidStateTransitonException at FINAL_SAVING state in RMApp. Contributed by Rohith Sharmaks (jianhe: rev c0d9b93953767608dfe429ddb9bd4c1c3bd3debf) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING - Key: YARN-933 URL: https://issues.apache.org/jira/browse/YARN-933 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.5-alpha Reporter: J.Andreina Assignee: Rohith Fix For: 2.7.0 Attachments: 0001-YARN-933.patch, 0001-YARN-933.patch, 0004-YARN-933.patch, YARN-933.3.patch, YARN-933.patch am max retries configured as 3 at client and RM side. Step 1: Install cluster with NM on 2 Machines Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But using Hostname should fail Step 3: Execute a job Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , connection loss happened. Observation : == After AppAttempt_1 has moved to failed state ,release of container for AppAttempt_1 and Application removal are successful. New AppAttempt_2 is sponed. 1. Then again retry for AppAttempt_1 happens. 2. Again RM side it is trying to launch AppAttempt_1, hence fails with InvalidStateTransitonException 3. Client got exited after AppAttempt_1 is been finished [But actually job is still running ], while the appattempts configured is 3 and rest appattempts are all sponed and running. RMLogs: == 2013-07-17 16:22:51,013 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_01 State change from SCHEDULED to ALLOCATED 2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); maxRetries=45 2013-07-17 16:36:07,091 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: Expired:container_1373952096466_0056_01_01 Timed out after 600 secs 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1373952096466_0056_01_01 Container Transitioned from ACQUIRED to EXPIRED 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Registering appattempt_1373952096466_0056_02 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application appattempt_1373952096466_0056_01 is done. finalState=FAILED 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Application removed - appId: application_1373952096466_0056 user: Rex leaf-queue of parent: root #applications: 35 2013-07-17 16:36:07,132 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application Submission: appattempt_1373952096466_0056_02, 2013-07-17 16:36:07,138 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_02 State change from SUBMITTED to SCHEDULED 2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 38 time(s); maxRetries=45 2013-07-17 16:38:36,203 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 44 time(s); maxRetries=45 2013-07-17 16:38:56,207 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error launching appattempt_1373952096466_0056_01. Got exception: java.lang.reflect.UndeclaredThrowableException 2013-07-17 16:38:56,207 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: LAUNCH_FAILED at FAILED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
[jira] [Commented] (YARN-933) Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING
[ https://issues.apache.org/jira/browse/YARN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328348#comment-14328348 ] Hudson commented on YARN-933: - FAILURE: Integrated in Hadoop-trunk-Commit #7158 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7158/]) YARN-933. Fixed InvalidStateTransitonException at FINAL_SAVING state in RMApp. Contributed by Rohith Sharmaks (jianhe: rev c0d9b93953767608dfe429ddb9bd4c1c3bd3debf) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING - Key: YARN-933 URL: https://issues.apache.org/jira/browse/YARN-933 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.5-alpha Reporter: J.Andreina Assignee: Rohith Fix For: 2.7.0 Attachments: 0001-YARN-933.patch, 0001-YARN-933.patch, 0004-YARN-933.patch, YARN-933.3.patch, YARN-933.patch am max retries configured as 3 at client and RM side. Step 1: Install cluster with NM on 2 Machines Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But using Hostname should fail Step 3: Execute a job Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , connection loss happened. Observation : == After AppAttempt_1 has moved to failed state ,release of container for AppAttempt_1 and Application removal are successful. New AppAttempt_2 is sponed. 1. Then again retry for AppAttempt_1 happens. 2. Again RM side it is trying to launch AppAttempt_1, hence fails with InvalidStateTransitonException 3. Client got exited after AppAttempt_1 is been finished [But actually job is still running ], while the appattempts configured is 3 and rest appattempts are all sponed and running. RMLogs: == 2013-07-17 16:22:51,013 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_01 State change from SCHEDULED to ALLOCATED 2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); maxRetries=45 2013-07-17 16:36:07,091 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: Expired:container_1373952096466_0056_01_01 Timed out after 600 secs 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1373952096466_0056_01_01 Container Transitioned from ACQUIRED to EXPIRED 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Registering appattempt_1373952096466_0056_02 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application appattempt_1373952096466_0056_01 is done. finalState=FAILED 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Application removed - appId: application_1373952096466_0056 user: Rex leaf-queue of parent: root #applications: 35 2013-07-17 16:36:07,132 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application Submission: appattempt_1373952096466_0056_02, 2013-07-17 16:36:07,138 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_02 State change from SUBMITTED to SCHEDULED 2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 38 time(s); maxRetries=45 2013-07-17 16:38:36,203 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 44 time(s); maxRetries=45 2013-07-17 16:38:56,207 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error launching appattempt_1373952096466_0056_01. Got exception: java.lang.reflect.UndeclaredThrowableException 2013-07-17 16:38:56,207 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: LAUNCH_FAILED at FAILED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at
[jira] [Commented] (YARN-933) Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING
[ https://issues.apache.org/jira/browse/YARN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326235#comment-14326235 ] Rohith commented on YARN-933: - [~jianhe] kindly review the updated patch. Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING - Key: YARN-933 URL: https://issues.apache.org/jira/browse/YARN-933 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.5-alpha Reporter: J.Andreina Assignee: Rohith Attachments: 0001-YARN-933.patch, 0001-YARN-933.patch, 0004-YARN-933.patch, YARN-933.3.patch, YARN-933.patch am max retries configured as 3 at client and RM side. Step 1: Install cluster with NM on 2 Machines Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But using Hostname should fail Step 3: Execute a job Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , connection loss happened. Observation : == After AppAttempt_1 has moved to failed state ,release of container for AppAttempt_1 and Application removal are successful. New AppAttempt_2 is sponed. 1. Then again retry for AppAttempt_1 happens. 2. Again RM side it is trying to launch AppAttempt_1, hence fails with InvalidStateTransitonException 3. Client got exited after AppAttempt_1 is been finished [But actually job is still running ], while the appattempts configured is 3 and rest appattempts are all sponed and running. RMLogs: == 2013-07-17 16:22:51,013 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_01 State change from SCHEDULED to ALLOCATED 2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); maxRetries=45 2013-07-17 16:36:07,091 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: Expired:container_1373952096466_0056_01_01 Timed out after 600 secs 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1373952096466_0056_01_01 Container Transitioned from ACQUIRED to EXPIRED 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Registering appattempt_1373952096466_0056_02 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application appattempt_1373952096466_0056_01 is done. finalState=FAILED 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Application removed - appId: application_1373952096466_0056 user: Rex leaf-queue of parent: root #applications: 35 2013-07-17 16:36:07,132 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application Submission: appattempt_1373952096466_0056_02, 2013-07-17 16:36:07,138 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_02 State change from SUBMITTED to SCHEDULED 2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 38 time(s); maxRetries=45 2013-07-17 16:38:36,203 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 44 time(s); maxRetries=45 2013-07-17 16:38:56,207 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error launching appattempt_1373952096466_0056_01. Got exception: java.lang.reflect.UndeclaredThrowableException 2013-07-17 16:38:56,207 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: LAUNCH_FAILED at FAILED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:630) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:99) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:495) at
[jira] [Commented] (YARN-933) Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING
[ https://issues.apache.org/jira/browse/YARN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326557#comment-14326557 ] Jian He commented on YARN-933: -- lgtm, +1 Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING - Key: YARN-933 URL: https://issues.apache.org/jira/browse/YARN-933 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.5-alpha Reporter: J.Andreina Assignee: Rohith Attachments: 0001-YARN-933.patch, 0001-YARN-933.patch, 0004-YARN-933.patch, YARN-933.3.patch, YARN-933.patch am max retries configured as 3 at client and RM side. Step 1: Install cluster with NM on 2 Machines Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But using Hostname should fail Step 3: Execute a job Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , connection loss happened. Observation : == After AppAttempt_1 has moved to failed state ,release of container for AppAttempt_1 and Application removal are successful. New AppAttempt_2 is sponed. 1. Then again retry for AppAttempt_1 happens. 2. Again RM side it is trying to launch AppAttempt_1, hence fails with InvalidStateTransitonException 3. Client got exited after AppAttempt_1 is been finished [But actually job is still running ], while the appattempts configured is 3 and rest appattempts are all sponed and running. RMLogs: == 2013-07-17 16:22:51,013 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_01 State change from SCHEDULED to ALLOCATED 2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); maxRetries=45 2013-07-17 16:36:07,091 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: Expired:container_1373952096466_0056_01_01 Timed out after 600 secs 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1373952096466_0056_01_01 Container Transitioned from ACQUIRED to EXPIRED 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Registering appattempt_1373952096466_0056_02 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application appattempt_1373952096466_0056_01 is done. finalState=FAILED 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Application removed - appId: application_1373952096466_0056 user: Rex leaf-queue of parent: root #applications: 35 2013-07-17 16:36:07,132 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application Submission: appattempt_1373952096466_0056_02, 2013-07-17 16:36:07,138 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_02 State change from SUBMITTED to SCHEDULED 2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 38 time(s); maxRetries=45 2013-07-17 16:38:36,203 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 44 time(s); maxRetries=45 2013-07-17 16:38:56,207 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error launching appattempt_1373952096466_0056_01. Got exception: java.lang.reflect.UndeclaredThrowableException 2013-07-17 16:38:56,207 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: LAUNCH_FAILED at FAILED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:630) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:99) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:495) at
[jira] [Commented] (YARN-933) Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING
[ https://issues.apache.org/jira/browse/YARN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313834#comment-14313834 ] Hadoop QA commented on YARN-933: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12697671/0004-YARN-933.patch against trunk revision b73956f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6573//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6573//console This message is automatically generated. Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING - Key: YARN-933 URL: https://issues.apache.org/jira/browse/YARN-933 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.5-alpha Reporter: J.Andreina Assignee: Rohith Attachments: 0001-YARN-933.patch, 0001-YARN-933.patch, 0004-YARN-933.patch, YARN-933.3.patch, YARN-933.patch am max retries configured as 3 at client and RM side. Step 1: Install cluster with NM on 2 Machines Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But using Hostname should fail Step 3: Execute a job Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , connection loss happened. Observation : == After AppAttempt_1 has moved to failed state ,release of container for AppAttempt_1 and Application removal are successful. New AppAttempt_2 is sponed. 1. Then again retry for AppAttempt_1 happens. 2. Again RM side it is trying to launch AppAttempt_1, hence fails with InvalidStateTransitonException 3. Client got exited after AppAttempt_1 is been finished [But actually job is still running ], while the appattempts configured is 3 and rest appattempts are all sponed and running. RMLogs: == 2013-07-17 16:22:51,013 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_01 State change from SCHEDULED to ALLOCATED 2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); maxRetries=45 2013-07-17 16:36:07,091 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: Expired:container_1373952096466_0056_01_01 Timed out after 600 secs 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1373952096466_0056_01_01 Container Transitioned from ACQUIRED to EXPIRED 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Registering appattempt_1373952096466_0056_02 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application appattempt_1373952096466_0056_01 is done. finalState=FAILED 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Application removed - appId: application_1373952096466_0056 user: Rex leaf-queue of parent: root #applications: 35 2013-07-17 16:36:07,132 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application Submission: appattempt_1373952096466_0056_02, 2013-07-17 16:36:07,138 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_02 State change from SUBMITTED to SCHEDULED 2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 38 time(s); maxRetries=45 2013-07-17 16:38:36,203 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 44 time(s); maxRetries=45 2013-07-17
[jira] [Commented] (YARN-933) Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING
[ https://issues.apache.org/jira/browse/YARN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313727#comment-14313727 ] Rohith commented on YARN-933: - thanks [~jianhe] for updating changed patch:-) Just to be flow consistency in test for RMAppAttemptImpl state i.e new-submitted-scheduled-allocated_saving-allocated--kill event- final_saving-killed I'd like to add {{allocateApplicationAttempt()}}. Updated the patch for doing verification at killed state. Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING - Key: YARN-933 URL: https://issues.apache.org/jira/browse/YARN-933 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.5-alpha Reporter: J.Andreina Assignee: Rohith Attachments: 0001-YARN-933.patch, 0001-YARN-933.patch, 0004-YARN-933.patch, YARN-933.3.patch, YARN-933.patch am max retries configured as 3 at client and RM side. Step 1: Install cluster with NM on 2 Machines Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But using Hostname should fail Step 3: Execute a job Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , connection loss happened. Observation : == After AppAttempt_1 has moved to failed state ,release of container for AppAttempt_1 and Application removal are successful. New AppAttempt_2 is sponed. 1. Then again retry for AppAttempt_1 happens. 2. Again RM side it is trying to launch AppAttempt_1, hence fails with InvalidStateTransitonException 3. Client got exited after AppAttempt_1 is been finished [But actually job is still running ], while the appattempts configured is 3 and rest appattempts are all sponed and running. RMLogs: == 2013-07-17 16:22:51,013 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_01 State change from SCHEDULED to ALLOCATED 2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); maxRetries=45 2013-07-17 16:36:07,091 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: Expired:container_1373952096466_0056_01_01 Timed out after 600 secs 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1373952096466_0056_01_01 Container Transitioned from ACQUIRED to EXPIRED 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Registering appattempt_1373952096466_0056_02 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application appattempt_1373952096466_0056_01 is done. finalState=FAILED 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Application removed - appId: application_1373952096466_0056 user: Rex leaf-queue of parent: root #applications: 35 2013-07-17 16:36:07,132 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application Submission: appattempt_1373952096466_0056_02, 2013-07-17 16:36:07,138 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_02 State change from SUBMITTED to SCHEDULED 2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 38 time(s); maxRetries=45 2013-07-17 16:38:36,203 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 44 time(s); maxRetries=45 2013-07-17 16:38:56,207 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error launching appattempt_1373952096466_0056_01. Got exception: java.lang.reflect.UndeclaredThrowableException 2013-07-17 16:38:56,207 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: LAUNCH_FAILED at FAILED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:630) at
[jira] [Commented] (YARN-933) Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING
[ https://issues.apache.org/jira/browse/YARN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313286#comment-14313286 ] Jian He commented on YARN-933: -- bq. You should not ignore RMAppAttemptEventType.LAUNCHED? We will have to explicitly kill the AppAttempt and the AM in this case The AM here is being killed. Allocated state gets the kill event and kill the AM(send the clean up event to the AM launcher) and then moves to the final_saving state. Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING - Key: YARN-933 URL: https://issues.apache.org/jira/browse/YARN-933 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.5-alpha Reporter: J.Andreina Assignee: Rohith Attachments: 0001-YARN-933.patch, 0001-YARN-933.patch, YARN-933.3.patch, YARN-933.patch am max retries configured as 3 at client and RM side. Step 1: Install cluster with NM on 2 Machines Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But using Hostname should fail Step 3: Execute a job Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , connection loss happened. Observation : == After AppAttempt_1 has moved to failed state ,release of container for AppAttempt_1 and Application removal are successful. New AppAttempt_2 is sponed. 1. Then again retry for AppAttempt_1 happens. 2. Again RM side it is trying to launch AppAttempt_1, hence fails with InvalidStateTransitonException 3. Client got exited after AppAttempt_1 is been finished [But actually job is still running ], while the appattempts configured is 3 and rest appattempts are all sponed and running. RMLogs: == 2013-07-17 16:22:51,013 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_01 State change from SCHEDULED to ALLOCATED 2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); maxRetries=45 2013-07-17 16:36:07,091 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: Expired:container_1373952096466_0056_01_01 Timed out after 600 secs 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1373952096466_0056_01_01 Container Transitioned from ACQUIRED to EXPIRED 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Registering appattempt_1373952096466_0056_02 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application appattempt_1373952096466_0056_01 is done. finalState=FAILED 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Application removed - appId: application_1373952096466_0056 user: Rex leaf-queue of parent: root #applications: 35 2013-07-17 16:36:07,132 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application Submission: appattempt_1373952096466_0056_02, 2013-07-17 16:36:07,138 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_02 State change from SUBMITTED to SCHEDULED 2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 38 time(s); maxRetries=45 2013-07-17 16:38:36,203 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 44 time(s); maxRetries=45 2013-07-17 16:38:56,207 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error launching appattempt_1373952096466_0056_01. Got exception: java.lang.reflect.UndeclaredThrowableException 2013-07-17 16:38:56,207 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: LAUNCH_FAILED at FAILED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:630) at
[jira] [Commented] (YARN-933) Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING
[ https://issues.apache.org/jira/browse/YARN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14312926#comment-14312926 ] Jian He commented on YARN-933: -- the variable {{Container amContainer = allocateApplicationAttempt();}} is not used. just updating the patch myself Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING - Key: YARN-933 URL: https://issues.apache.org/jira/browse/YARN-933 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.5-alpha Reporter: J.Andreina Assignee: Rohith Attachments: 0001-YARN-933.patch, 0001-YARN-933.patch, YARN-933.patch am max retries configured as 3 at client and RM side. Step 1: Install cluster with NM on 2 Machines Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But using Hostname should fail Step 3: Execute a job Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , connection loss happened. Observation : == After AppAttempt_1 has moved to failed state ,release of container for AppAttempt_1 and Application removal are successful. New AppAttempt_2 is sponed. 1. Then again retry for AppAttempt_1 happens. 2. Again RM side it is trying to launch AppAttempt_1, hence fails with InvalidStateTransitonException 3. Client got exited after AppAttempt_1 is been finished [But actually job is still running ], while the appattempts configured is 3 and rest appattempts are all sponed and running. RMLogs: == 2013-07-17 16:22:51,013 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_01 State change from SCHEDULED to ALLOCATED 2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); maxRetries=45 2013-07-17 16:36:07,091 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: Expired:container_1373952096466_0056_01_01 Timed out after 600 secs 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1373952096466_0056_01_01 Container Transitioned from ACQUIRED to EXPIRED 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Registering appattempt_1373952096466_0056_02 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application appattempt_1373952096466_0056_01 is done. finalState=FAILED 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Application removed - appId: application_1373952096466_0056 user: Rex leaf-queue of parent: root #applications: 35 2013-07-17 16:36:07,132 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application Submission: appattempt_1373952096466_0056_02, 2013-07-17 16:36:07,138 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_02 State change from SUBMITTED to SCHEDULED 2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 38 time(s); maxRetries=45 2013-07-17 16:38:36,203 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 44 time(s); maxRetries=45 2013-07-17 16:38:56,207 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error launching appattempt_1373952096466_0056_01. Got exception: java.lang.reflect.UndeclaredThrowableException 2013-07-17 16:38:56,207 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: LAUNCH_FAILED at FAILED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:630) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:99) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:495) at
[jira] [Commented] (YARN-933) Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING
[ https://issues.apache.org/jira/browse/YARN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313010#comment-14313010 ] Hadoop QA commented on YARN-933: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12697404/0001-YARN-933.patch against trunk revision fcad031. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6555//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6555//console This message is automatically generated. Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING - Key: YARN-933 URL: https://issues.apache.org/jira/browse/YARN-933 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.5-alpha Reporter: J.Andreina Assignee: Rohith Attachments: 0001-YARN-933.patch, 0001-YARN-933.patch, YARN-933.3.patch, YARN-933.patch am max retries configured as 3 at client and RM side. Step 1: Install cluster with NM on 2 Machines Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But using Hostname should fail Step 3: Execute a job Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , connection loss happened. Observation : == After AppAttempt_1 has moved to failed state ,release of container for AppAttempt_1 and Application removal are successful. New AppAttempt_2 is sponed. 1. Then again retry for AppAttempt_1 happens. 2. Again RM side it is trying to launch AppAttempt_1, hence fails with InvalidStateTransitonException 3. Client got exited after AppAttempt_1 is been finished [But actually job is still running ], while the appattempts configured is 3 and rest appattempts are all sponed and running. RMLogs: == 2013-07-17 16:22:51,013 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_01 State change from SCHEDULED to ALLOCATED 2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); maxRetries=45 2013-07-17 16:36:07,091 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: Expired:container_1373952096466_0056_01_01 Timed out after 600 secs 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1373952096466_0056_01_01 Container Transitioned from ACQUIRED to EXPIRED 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Registering appattempt_1373952096466_0056_02 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application appattempt_1373952096466_0056_01 is done. finalState=FAILED 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Application removed - appId: application_1373952096466_0056 user: Rex leaf-queue of parent: root #applications: 35 2013-07-17 16:36:07,132 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application Submission: appattempt_1373952096466_0056_02, 2013-07-17 16:36:07,138 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_02 State change from SUBMITTED to SCHEDULED 2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 38 time(s); maxRetries=45 2013-07-17 16:38:36,203 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 44 time(s); maxRetries=45 2013-07-17 16:38:56,207 INFO
[jira] [Commented] (YARN-933) Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING
[ https://issues.apache.org/jira/browse/YARN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313227#comment-14313227 ] Hadoop QA commented on YARN-933: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12697566/YARN-933.3.patch against trunk revision aab459c. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 13 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core: org.apache.hadoop.mapreduce.lib.input.TestLineRecordReader org.apache.hadoop.mapred.TestLineRecordReader Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6560//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6560//artifact/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6560//console This message is automatically generated. Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING - Key: YARN-933 URL: https://issues.apache.org/jira/browse/YARN-933 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.5-alpha Reporter: J.Andreina Assignee: Rohith Attachments: 0001-YARN-933.patch, 0001-YARN-933.patch, YARN-933.3.patch, YARN-933.patch am max retries configured as 3 at client and RM side. Step 1: Install cluster with NM on 2 Machines Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But using Hostname should fail Step 3: Execute a job Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , connection loss happened. Observation : == After AppAttempt_1 has moved to failed state ,release of container for AppAttempt_1 and Application removal are successful. New AppAttempt_2 is sponed. 1. Then again retry for AppAttempt_1 happens. 2. Again RM side it is trying to launch AppAttempt_1, hence fails with InvalidStateTransitonException 3. Client got exited after AppAttempt_1 is been finished [But actually job is still running ], while the appattempts configured is 3 and rest appattempts are all sponed and running. RMLogs: == 2013-07-17 16:22:51,013 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_01 State change from SCHEDULED to ALLOCATED 2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); maxRetries=45 2013-07-17 16:36:07,091 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: Expired:container_1373952096466_0056_01_01 Timed out after 600 secs 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1373952096466_0056_01_01 Container Transitioned from ACQUIRED to EXPIRED 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Registering appattempt_1373952096466_0056_02 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application appattempt_1373952096466_0056_01 is done. finalState=FAILED 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Application removed - appId: application_1373952096466_0056 user: Rex leaf-queue of parent: root #applications: 35 2013-07-17 16:36:07,132 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application Submission: appattempt_1373952096466_0056_02, 2013-07-17 16:36:07,138 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_02 State change from SUBMITTED to SCHEDULED 2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect to
[jira] [Commented] (YARN-933) Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING
[ https://issues.apache.org/jira/browse/YARN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313248#comment-14313248 ] Vinod Kumar Vavilapalli commented on YARN-933: -- You should not ignore RMAppAttemptEventType.LAUNCHED? We will have to explicitly kill the AppAttempt and the AM in this case, right? Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING - Key: YARN-933 URL: https://issues.apache.org/jira/browse/YARN-933 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.5-alpha Reporter: J.Andreina Assignee: Rohith Attachments: 0001-YARN-933.patch, 0001-YARN-933.patch, YARN-933.3.patch, YARN-933.patch am max retries configured as 3 at client and RM side. Step 1: Install cluster with NM on 2 Machines Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But using Hostname should fail Step 3: Execute a job Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , connection loss happened. Observation : == After AppAttempt_1 has moved to failed state ,release of container for AppAttempt_1 and Application removal are successful. New AppAttempt_2 is sponed. 1. Then again retry for AppAttempt_1 happens. 2. Again RM side it is trying to launch AppAttempt_1, hence fails with InvalidStateTransitonException 3. Client got exited after AppAttempt_1 is been finished [But actually job is still running ], while the appattempts configured is 3 and rest appattempts are all sponed and running. RMLogs: == 2013-07-17 16:22:51,013 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_01 State change from SCHEDULED to ALLOCATED 2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); maxRetries=45 2013-07-17 16:36:07,091 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: Expired:container_1373952096466_0056_01_01 Timed out after 600 secs 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1373952096466_0056_01_01 Container Transitioned from ACQUIRED to EXPIRED 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Registering appattempt_1373952096466_0056_02 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application appattempt_1373952096466_0056_01 is done. finalState=FAILED 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Application removed - appId: application_1373952096466_0056 user: Rex leaf-queue of parent: root #applications: 35 2013-07-17 16:36:07,132 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application Submission: appattempt_1373952096466_0056_02, 2013-07-17 16:36:07,138 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_02 State change from SUBMITTED to SCHEDULED 2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 38 time(s); maxRetries=45 2013-07-17 16:38:36,203 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 44 time(s); maxRetries=45 2013-07-17 16:38:56,207 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error launching appattempt_1373952096466_0056_01. Got exception: java.lang.reflect.UndeclaredThrowableException 2013-07-17 16:38:56,207 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: LAUNCH_FAILED at FAILED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:630) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:99) at