[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby
[ https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256895#comment-14256895 ] Hudson commented on YARN-2340: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #50 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/50/]) YARN-2340. Fixed NPE when queue is stopped during RM restart. Contributed by Rohith Sharmaks (jianhe: rev 0d89859b51157078cc504ac81dc8aa75ce6b1782) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby -- Key: YARN-2340 URL: https://issues.apache.org/jira/browse/YARN-2340 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.4.1 Environment: Capacityscheduler with Queue a, b Reporter: Nishan Shetty Assignee: Rohith Priority: Critical Fix For: 2.7.0 Attachments: 0001-YARN-2340.patch While job is in progress make Queue state as STOPPED and then restart RM Observe that standby RM fails to come up as acive throwing below NPE 2014-07-23 18:43:24,432 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED 2014-07-23 18:43:24,433 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602) at java.lang.Thread.run(Thread.java:662) 2014-07-23 18:43:24,434 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby
[ https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256907#comment-14256907 ] Hudson commented on YARN-2340: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #784 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/784/]) YARN-2340. Fixed NPE when queue is stopped during RM restart. Contributed by Rohith Sharmaks (jianhe: rev 0d89859b51157078cc504ac81dc8aa75ce6b1782) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby -- Key: YARN-2340 URL: https://issues.apache.org/jira/browse/YARN-2340 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.4.1 Environment: Capacityscheduler with Queue a, b Reporter: Nishan Shetty Assignee: Rohith Priority: Critical Fix For: 2.7.0 Attachments: 0001-YARN-2340.patch While job is in progress make Queue state as STOPPED and then restart RM Observe that standby RM fails to come up as acive throwing below NPE 2014-07-23 18:43:24,432 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED 2014-07-23 18:43:24,433 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602) at java.lang.Thread.run(Thread.java:662) 2014-07-23 18:43:24,434 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby
[ https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14257002#comment-14257002 ] Hudson commented on YARN-2340: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #47 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/47/]) YARN-2340. Fixed NPE when queue is stopped during RM restart. Contributed by Rohith Sharmaks (jianhe: rev 0d89859b51157078cc504ac81dc8aa75ce6b1782) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java * hadoop-yarn-project/CHANGES.txt NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby -- Key: YARN-2340 URL: https://issues.apache.org/jira/browse/YARN-2340 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.4.1 Environment: Capacityscheduler with Queue a, b Reporter: Nishan Shetty Assignee: Rohith Priority: Critical Fix For: 2.7.0 Attachments: 0001-YARN-2340.patch While job is in progress make Queue state as STOPPED and then restart RM Observe that standby RM fails to come up as acive throwing below NPE 2014-07-23 18:43:24,432 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED 2014-07-23 18:43:24,433 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602) at java.lang.Thread.run(Thread.java:662) 2014-07-23 18:43:24,434 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby
[ https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14257007#comment-14257007 ] Hudson commented on YARN-2340: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1982 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1982/]) YARN-2340. Fixed NPE when queue is stopped during RM restart. Contributed by Rohith Sharmaks (jianhe: rev 0d89859b51157078cc504ac81dc8aa75ce6b1782) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/CHANGES.txt NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby -- Key: YARN-2340 URL: https://issues.apache.org/jira/browse/YARN-2340 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.4.1 Environment: Capacityscheduler with Queue a, b Reporter: Nishan Shetty Assignee: Rohith Priority: Critical Fix For: 2.7.0 Attachments: 0001-YARN-2340.patch While job is in progress make Queue state as STOPPED and then restart RM Observe that standby RM fails to come up as acive throwing below NPE 2014-07-23 18:43:24,432 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED 2014-07-23 18:43:24,433 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602) at java.lang.Thread.run(Thread.java:662) 2014-07-23 18:43:24,434 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby
[ https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14257048#comment-14257048 ] Hudson commented on YARN-2340: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #51 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/51/]) YARN-2340. Fixed NPE when queue is stopped during RM restart. Contributed by Rohith Sharmaks (jianhe: rev 0d89859b51157078cc504ac81dc8aa75ce6b1782) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby -- Key: YARN-2340 URL: https://issues.apache.org/jira/browse/YARN-2340 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.4.1 Environment: Capacityscheduler with Queue a, b Reporter: Nishan Shetty Assignee: Rohith Priority: Critical Fix For: 2.7.0 Attachments: 0001-YARN-2340.patch While job is in progress make Queue state as STOPPED and then restart RM Observe that standby RM fails to come up as acive throwing below NPE 2014-07-23 18:43:24,432 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED 2014-07-23 18:43:24,433 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602) at java.lang.Thread.run(Thread.java:662) 2014-07-23 18:43:24,434 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby
[ https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14257091#comment-14257091 ] Hudson commented on YARN-2340: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2001 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2001/]) YARN-2340. Fixed NPE when queue is stopped during RM restart. Contributed by Rohith Sharmaks (jianhe: rev 0d89859b51157078cc504ac81dc8aa75ce6b1782) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java * hadoop-yarn-project/CHANGES.txt NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby -- Key: YARN-2340 URL: https://issues.apache.org/jira/browse/YARN-2340 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.4.1 Environment: Capacityscheduler with Queue a, b Reporter: Nishan Shetty Assignee: Rohith Priority: Critical Fix For: 2.7.0 Attachments: 0001-YARN-2340.patch While job is in progress make Queue state as STOPPED and then restart RM Observe that standby RM fails to come up as acive throwing below NPE 2014-07-23 18:43:24,432 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED 2014-07-23 18:43:24,433 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602) at java.lang.Thread.run(Thread.java:662) 2014-07-23 18:43:24,434 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby
[ https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1429#comment-1429 ] Rohith commented on YARN-2340: -- [~jianhe] Kindly review the patch NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby -- Key: YARN-2340 URL: https://issues.apache.org/jira/browse/YARN-2340 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.4.1 Environment: Capacityscheduler with Queue a, b Reporter: Nishan Shetty Assignee: Rohith Priority: Critical Attachments: 0001-YARN-2340.patch While job is in progress make Queue state as STOPPED and then restart RM Observe that standby RM fails to come up as acive throwing below NPE 2014-07-23 18:43:24,432 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED 2014-07-23 18:43:24,433 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602) at java.lang.Thread.run(Thread.java:662) 2014-07-23 18:43:24,434 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby
[ https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256362#comment-14256362 ] Jian He commented on YARN-2340: --- looks good, +1 NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby -- Key: YARN-2340 URL: https://issues.apache.org/jira/browse/YARN-2340 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.4.1 Environment: Capacityscheduler with Queue a, b Reporter: Nishan Shetty Assignee: Rohith Priority: Critical Attachments: 0001-YARN-2340.patch While job is in progress make Queue state as STOPPED and then restart RM Observe that standby RM fails to come up as acive throwing below NPE 2014-07-23 18:43:24,432 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED 2014-07-23 18:43:24,433 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602) at java.lang.Thread.run(Thread.java:662) 2014-07-23 18:43:24,434 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby
[ https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256584#comment-14256584 ] Hadoop QA commented on YARN-2340: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12687993/0001-YARN-2340.patch against trunk revision fdf042d. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 15 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6173//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6173//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6173//console This message is automatically generated. NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby -- Key: YARN-2340 URL: https://issues.apache.org/jira/browse/YARN-2340 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.4.1 Environment: Capacityscheduler with Queue a, b Reporter: Nishan Shetty Assignee: Rohith Priority: Critical Attachments: 0001-YARN-2340.patch While job is in progress make Queue state as STOPPED and then restart RM Observe that standby RM fails to come up as acive throwing below NPE 2014-07-23 18:43:24,432 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED 2014-07-23 18:43:24,433 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602) at java.lang.Thread.run(Thread.java:662) 2014-07-23 18:43:24,434 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby
[ https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256631#comment-14256631 ] Rohith commented on YARN-2340: -- There so many tests are failing randomly in trunk!! NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby -- Key: YARN-2340 URL: https://issues.apache.org/jira/browse/YARN-2340 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.4.1 Environment: Capacityscheduler with Queue a, b Reporter: Nishan Shetty Assignee: Rohith Priority: Critical Attachments: 0001-YARN-2340.patch While job is in progress make Queue state as STOPPED and then restart RM Observe that standby RM fails to come up as acive throwing below NPE 2014-07-23 18:43:24,432 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED 2014-07-23 18:43:24,433 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602) at java.lang.Thread.run(Thread.java:662) 2014-07-23 18:43:24,434 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby
[ https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256637#comment-14256637 ] Jian He commented on YARN-2340: --- right, we should spend time fixing these.. NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby -- Key: YARN-2340 URL: https://issues.apache.org/jira/browse/YARN-2340 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.4.1 Environment: Capacityscheduler with Queue a, b Reporter: Nishan Shetty Assignee: Rohith Priority: Critical Attachments: 0001-YARN-2340.patch While job is in progress make Queue state as STOPPED and then restart RM Observe that standby RM fails to come up as acive throwing below NPE 2014-07-23 18:43:24,432 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED 2014-07-23 18:43:24,433 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602) at java.lang.Thread.run(Thread.java:662) 2014-07-23 18:43:24,434 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby
[ https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256641#comment-14256641 ] Hudson commented on YARN-2340: -- FAILURE: Integrated in Hadoop-trunk-Commit #6777 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6777/]) YARN-2340. Fixed NPE when queue is stopped during RM restart. Contributed by Rohith Sharmaks (jianhe: rev 0d89859b51157078cc504ac81dc8aa75ce6b1782) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby -- Key: YARN-2340 URL: https://issues.apache.org/jira/browse/YARN-2340 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.4.1 Environment: Capacityscheduler with Queue a, b Reporter: Nishan Shetty Assignee: Rohith Priority: Critical Fix For: 2.7.0 Attachments: 0001-YARN-2340.patch While job is in progress make Queue state as STOPPED and then restart RM Observe that standby RM fails to come up as acive throwing below NPE 2014-07-23 18:43:24,432 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED 2014-07-23 18:43:24,433 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602) at java.lang.Thread.run(Thread.java:662) 2014-07-23 18:43:24,434 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby
[ https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251451#comment-14251451 ] Hadoop QA commented on YARN-2340: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12687993/0001-YARN-2340.patch against trunk revision 1050d42. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 14 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRM Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6144//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6144//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6144//console This message is automatically generated. NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby -- Key: YARN-2340 URL: https://issues.apache.org/jira/browse/YARN-2340 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.4.1 Environment: Capacityscheduler with Queue a, b Reporter: Nishan Shetty Assignee: Rohith Priority: Critical Attachments: 0001-YARN-2340.patch While job is in progress make Queue state as STOPPED and then restart RM Observe that standby RM fails to come up as acive throwing below NPE 2014-07-23 18:43:24,432 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED 2014-07-23 18:43:24,433 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602) at java.lang.Thread.run(Thread.java:662) 2014-07-23 18:43:24,434 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby
[ https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251455#comment-14251455 ] Rohith commented on YARN-2340: -- It looks failed tests is random. In my env, it is running successfully. NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby -- Key: YARN-2340 URL: https://issues.apache.org/jira/browse/YARN-2340 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.4.1 Environment: Capacityscheduler with Queue a, b Reporter: Nishan Shetty Assignee: Rohith Priority: Critical Attachments: 0001-YARN-2340.patch While job is in progress make Queue state as STOPPED and then restart RM Observe that standby RM fails to come up as acive throwing below NPE 2014-07-23 18:43:24,432 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED 2014-07-23 18:43:24,433 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602) at java.lang.Thread.run(Thread.java:662) 2014-07-23 18:43:24,434 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED
[ https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246426#comment-14246426 ] Rohith commented on YARN-2340: -- Thanks [~nishan] for reporting this issue. I too encountered with similar situation while testing on trunk code and later RM remain in stand by. NPE thrown when RM restart after queue is STOPPED - Key: YARN-2340 URL: https://issues.apache.org/jira/browse/YARN-2340 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.4.1 Environment: Capacityscheduler with Queue a, b Reporter: Nishan Shetty Assignee: Rohith Priority: Critical While job is in progress make Queue state as STOPPED and then restart RM Observe that standby RM fails to come up as acive throwing below NPE 2014-07-23 18:43:24,432 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED 2014-07-23 18:43:24,433 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602) at java.lang.Thread.run(Thread.java:662) 2014-07-23 18:43:24,434 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED
[ https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246442#comment-14246442 ] Rohith commented on YARN-2340: -- Scenario executed # Start Yarn cluster, and submit long running application to Queue to default.Initially, RM1 is active # *Stop the queue default* in both RM1 and RM2 using -refreshQueue. Queue can be stopped even when application is running, but wont accept new application submissions. # Switch the RM, let RM2 transitionedToActive. But here application recovery fails since queue already stopped. Below logs shows the failure, but *RMAppImpl state is updated as FAILED RMAppAttempt remain as null*. RM remain in standby {noformat} 2014-12-15 11:01:17,813 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Recovering app: application_1418620667348_0001 with 1 attempts and final state = null 2014-12-15 11:01:17,814 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Recovering attempt: appattempt_1418620667348_0001_01 with final state: null /. / 2014-12-15 11:01:17,824 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Queue root.default is STOPPED. Cannot accept submission of application: application_1418620667348_0001 2014-12-15 11:01:17,825 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Failed to submit application application_1418620667348_0001 to queue default from user rohith org.apache.hadoop.security.AccessControlException: Queue root.default is STOPPED. Cannot accept submission of application: application_1418620667348_0001 at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.submitApplication(LeafQueue.java:575) 2014-12-15 11:01:17,939 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Registering app attempt : appattempt_1418620667348_0001_01 2014-12-15 11:01:17,941 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Updating application application_1418620667348_0001 with final state: FAILED {noformat} # After restart , Final state in RMApp=FAILED and RMAppImpl=null as shown below. RM can not recover the applications, and continuously fails. {noformat} 2014-12-15 11:01:41,493 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Recovering app: application_1418620667348_0001 with 1 attempts and final state = FAILED 2014-12-15 11:01:41,494 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Recovering attempt: appattempt_1418620667348_0001_01 with final state: null {noformat} NPE thrown when RM restart after queue is STOPPED - Key: YARN-2340 URL: https://issues.apache.org/jira/browse/YARN-2340 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.4.1 Environment: Capacityscheduler with Queue a, b Reporter: Nishan Shetty Assignee: Rohith Priority: Critical While job is in progress make Queue state as STOPPED and then restart RM Observe that standby RM fails to come up as acive throwing below NPE 2014-07-23 18:43:24,432 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED 2014-07-23 18:43:24,433 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602) at java.lang.Thread.run(Thread.java:662) 2014-07-23 18:43:24,434 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby
[ https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246555#comment-14246555 ] Rohith commented on YARN-2340: -- Some thoughts for fixing this issue either of below 2 are 1. Straight away invoke KILL event for application if application is submitting into STOPPED queue during recovering applications. KILL event smothly transition RMApp/RMAppAttempt to KILLED state. But throw exception while killing master container since Either NM's were not registered to RM OR Connection Refused when NM is down. {{CS#addApplication}} {code} // Submit to the queue try { queue.submitApplication(applicationId, user, queueName); } catch (AccessControlException ace) { LOG.info(Failed to submit application + applicationId + to queue + queueName + from user + user, ace); if (isAppRecovering) { LOG.info(Killing the application + applicationId); this.rmContext.getDispatcher().getEventHandler() .handle(new RMAppEvent(applicationId, RMAppEventType.KILL)); } else { this.rmContext.getDispatcher().getEventHandler() .handle(new RMAppRejectedEvent(applicationId, ace.toString())); } return; } {code} {{CS#addApplicationAttempt}} {code} SchedulerApplicationFiCaSchedulerApp application = applications.get(applicationAttemptId.getApplicationId()); if (application == null isAttemptRecovering) { LOG.info(Attempt is recovering from an application where Queue is stopped. + applicationAttemptId); return; } {code} 2. Introduce new event type like APP_RECOVERY_FAILED or APP_SCHEDULER_RECOVERY_FAILED and trigger from Scheduler if app is submitted to stopped queue while recovering. Transitions would be like below AppAttempt : {{NEW to LAUNCHED}} App : {{NEW to ACCEPTED}} App : {{ACCEPTED to FINAL_SAVING}} on event APP_RECOVERY_FAILED or APP_SCHEDULER_RECOVERY_FAILED AppAttempt : {{LAUNCHED to FINAL_SAVING}} AppAttempt : {{FINAL_SAVING to FAILED}} App : {{FINAL_SAVING to FAILED}} Please give your suggestions/thoughts. NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby -- Key: YARN-2340 URL: https://issues.apache.org/jira/browse/YARN-2340 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.4.1 Environment: Capacityscheduler with Queue a, b Reporter: Nishan Shetty Assignee: Rohith Priority: Critical While job is in progress make Queue state as STOPPED and then restart RM Observe that standby RM fails to come up as acive throwing below NPE 2014-07-23 18:43:24,432 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED 2014-07-23 18:43:24,433 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602) at java.lang.Thread.run(Thread.java:662) 2014-07-23 18:43:24,434 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby
[ https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247415#comment-14247415 ] Jian He commented on YARN-2340: --- Today, the semantics to stop a queue is to let the existing applications run into completion. We should retain the same semantics for RM restart as well. In this case, I think we need to ignore this exception and continue because the application was accepted before the queue is changed to stopped. Similar problem could happen if we change the application acl and restart RM while application is running. NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby -- Key: YARN-2340 URL: https://issues.apache.org/jira/browse/YARN-2340 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.4.1 Environment: Capacityscheduler with Queue a, b Reporter: Nishan Shetty Assignee: Rohith Priority: Critical While job is in progress make Queue state as STOPPED and then restart RM Observe that standby RM fails to come up as acive throwing below NPE 2014-07-23 18:43:24,432 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED 2014-07-23 18:43:24,433 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602) at java.lang.Thread.run(Thread.java:662) 2014-07-23 18:43:24,434 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)