[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby
[ https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256895#comment-14256895 ] Hudson commented on YARN-2340: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #50 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/50/]) YARN-2340. Fixed NPE when queue is stopped during RM restart. Contributed by Rohith Sharmaks (jianhe: rev 0d89859b51157078cc504ac81dc8aa75ce6b1782) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby -- Key: YARN-2340 URL: https://issues.apache.org/jira/browse/YARN-2340 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.4.1 Environment: Capacityscheduler with Queue a, b Reporter: Nishan Shetty Assignee: Rohith Priority: Critical Fix For: 2.7.0 Attachments: 0001-YARN-2340.patch While job is in progress make Queue state as STOPPED and then restart RM Observe that standby RM fails to come up as acive throwing below NPE 2014-07-23 18:43:24,432 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED 2014-07-23 18:43:24,433 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602) at java.lang.Thread.run(Thread.java:662) 2014-07-23 18:43:24,434 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby
[ https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256907#comment-14256907 ] Hudson commented on YARN-2340: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #784 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/784/]) YARN-2340. Fixed NPE when queue is stopped during RM restart. Contributed by Rohith Sharmaks (jianhe: rev 0d89859b51157078cc504ac81dc8aa75ce6b1782) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby -- Key: YARN-2340 URL: https://issues.apache.org/jira/browse/YARN-2340 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.4.1 Environment: Capacityscheduler with Queue a, b Reporter: Nishan Shetty Assignee: Rohith Priority: Critical Fix For: 2.7.0 Attachments: 0001-YARN-2340.patch While job is in progress make Queue state as STOPPED and then restart RM Observe that standby RM fails to come up as acive throwing below NPE 2014-07-23 18:43:24,432 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED 2014-07-23 18:43:24,433 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602) at java.lang.Thread.run(Thread.java:662) 2014-07-23 18:43:24,434 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby
[ https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14257002#comment-14257002 ] Hudson commented on YARN-2340: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #47 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/47/]) YARN-2340. Fixed NPE when queue is stopped during RM restart. Contributed by Rohith Sharmaks (jianhe: rev 0d89859b51157078cc504ac81dc8aa75ce6b1782) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java * hadoop-yarn-project/CHANGES.txt NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby -- Key: YARN-2340 URL: https://issues.apache.org/jira/browse/YARN-2340 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.4.1 Environment: Capacityscheduler with Queue a, b Reporter: Nishan Shetty Assignee: Rohith Priority: Critical Fix For: 2.7.0 Attachments: 0001-YARN-2340.patch While job is in progress make Queue state as STOPPED and then restart RM Observe that standby RM fails to come up as acive throwing below NPE 2014-07-23 18:43:24,432 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED 2014-07-23 18:43:24,433 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602) at java.lang.Thread.run(Thread.java:662) 2014-07-23 18:43:24,434 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby
[ https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14257007#comment-14257007 ] Hudson commented on YARN-2340: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1982 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1982/]) YARN-2340. Fixed NPE when queue is stopped during RM restart. Contributed by Rohith Sharmaks (jianhe: rev 0d89859b51157078cc504ac81dc8aa75ce6b1782) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/CHANGES.txt NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby -- Key: YARN-2340 URL: https://issues.apache.org/jira/browse/YARN-2340 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.4.1 Environment: Capacityscheduler with Queue a, b Reporter: Nishan Shetty Assignee: Rohith Priority: Critical Fix For: 2.7.0 Attachments: 0001-YARN-2340.patch While job is in progress make Queue state as STOPPED and then restart RM Observe that standby RM fails to come up as acive throwing below NPE 2014-07-23 18:43:24,432 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED 2014-07-23 18:43:24,433 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602) at java.lang.Thread.run(Thread.java:662) 2014-07-23 18:43:24,434 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby
[ https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14257048#comment-14257048 ] Hudson commented on YARN-2340: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #51 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/51/]) YARN-2340. Fixed NPE when queue is stopped during RM restart. Contributed by Rohith Sharmaks (jianhe: rev 0d89859b51157078cc504ac81dc8aa75ce6b1782) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby -- Key: YARN-2340 URL: https://issues.apache.org/jira/browse/YARN-2340 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.4.1 Environment: Capacityscheduler with Queue a, b Reporter: Nishan Shetty Assignee: Rohith Priority: Critical Fix For: 2.7.0 Attachments: 0001-YARN-2340.patch While job is in progress make Queue state as STOPPED and then restart RM Observe that standby RM fails to come up as acive throwing below NPE 2014-07-23 18:43:24,432 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED 2014-07-23 18:43:24,433 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602) at java.lang.Thread.run(Thread.java:662) 2014-07-23 18:43:24,434 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby
[ https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14257091#comment-14257091 ] Hudson commented on YARN-2340: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2001 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2001/]) YARN-2340. Fixed NPE when queue is stopped during RM restart. Contributed by Rohith Sharmaks (jianhe: rev 0d89859b51157078cc504ac81dc8aa75ce6b1782) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java * hadoop-yarn-project/CHANGES.txt NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby -- Key: YARN-2340 URL: https://issues.apache.org/jira/browse/YARN-2340 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.4.1 Environment: Capacityscheduler with Queue a, b Reporter: Nishan Shetty Assignee: Rohith Priority: Critical Fix For: 2.7.0 Attachments: 0001-YARN-2340.patch While job is in progress make Queue state as STOPPED and then restart RM Observe that standby RM fails to come up as acive throwing below NPE 2014-07-23 18:43:24,432 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED 2014-07-23 18:43:24,433 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602) at java.lang.Thread.run(Thread.java:662) 2014-07-23 18:43:24,434 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby
[ https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1429#comment-1429 ] Rohith commented on YARN-2340: -- [~jianhe] Kindly review the patch NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby -- Key: YARN-2340 URL: https://issues.apache.org/jira/browse/YARN-2340 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.4.1 Environment: Capacityscheduler with Queue a, b Reporter: Nishan Shetty Assignee: Rohith Priority: Critical Attachments: 0001-YARN-2340.patch While job is in progress make Queue state as STOPPED and then restart RM Observe that standby RM fails to come up as acive throwing below NPE 2014-07-23 18:43:24,432 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED 2014-07-23 18:43:24,433 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602) at java.lang.Thread.run(Thread.java:662) 2014-07-23 18:43:24,434 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby
[ https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256362#comment-14256362 ] Jian He commented on YARN-2340: --- looks good, +1 NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby -- Key: YARN-2340 URL: https://issues.apache.org/jira/browse/YARN-2340 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.4.1 Environment: Capacityscheduler with Queue a, b Reporter: Nishan Shetty Assignee: Rohith Priority: Critical Attachments: 0001-YARN-2340.patch While job is in progress make Queue state as STOPPED and then restart RM Observe that standby RM fails to come up as acive throwing below NPE 2014-07-23 18:43:24,432 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED 2014-07-23 18:43:24,433 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602) at java.lang.Thread.run(Thread.java:662) 2014-07-23 18:43:24,434 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby
[ https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256584#comment-14256584 ] Hadoop QA commented on YARN-2340: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12687993/0001-YARN-2340.patch against trunk revision fdf042d. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 15 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6173//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6173//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6173//console This message is automatically generated. NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby -- Key: YARN-2340 URL: https://issues.apache.org/jira/browse/YARN-2340 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.4.1 Environment: Capacityscheduler with Queue a, b Reporter: Nishan Shetty Assignee: Rohith Priority: Critical Attachments: 0001-YARN-2340.patch While job is in progress make Queue state as STOPPED and then restart RM Observe that standby RM fails to come up as acive throwing below NPE 2014-07-23 18:43:24,432 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED 2014-07-23 18:43:24,433 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602) at java.lang.Thread.run(Thread.java:662) 2014-07-23 18:43:24,434 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby
[ https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256631#comment-14256631 ] Rohith commented on YARN-2340: -- There so many tests are failing randomly in trunk!! NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby -- Key: YARN-2340 URL: https://issues.apache.org/jira/browse/YARN-2340 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.4.1 Environment: Capacityscheduler with Queue a, b Reporter: Nishan Shetty Assignee: Rohith Priority: Critical Attachments: 0001-YARN-2340.patch While job is in progress make Queue state as STOPPED and then restart RM Observe that standby RM fails to come up as acive throwing below NPE 2014-07-23 18:43:24,432 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED 2014-07-23 18:43:24,433 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602) at java.lang.Thread.run(Thread.java:662) 2014-07-23 18:43:24,434 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby
[ https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256637#comment-14256637 ] Jian He commented on YARN-2340: --- right, we should spend time fixing these.. NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby -- Key: YARN-2340 URL: https://issues.apache.org/jira/browse/YARN-2340 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.4.1 Environment: Capacityscheduler with Queue a, b Reporter: Nishan Shetty Assignee: Rohith Priority: Critical Attachments: 0001-YARN-2340.patch While job is in progress make Queue state as STOPPED and then restart RM Observe that standby RM fails to come up as acive throwing below NPE 2014-07-23 18:43:24,432 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED 2014-07-23 18:43:24,433 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602) at java.lang.Thread.run(Thread.java:662) 2014-07-23 18:43:24,434 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby
[ https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256641#comment-14256641 ] Hudson commented on YARN-2340: -- FAILURE: Integrated in Hadoop-trunk-Commit #6777 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6777/]) YARN-2340. Fixed NPE when queue is stopped during RM restart. Contributed by Rohith Sharmaks (jianhe: rev 0d89859b51157078cc504ac81dc8aa75ce6b1782) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby -- Key: YARN-2340 URL: https://issues.apache.org/jira/browse/YARN-2340 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.4.1 Environment: Capacityscheduler with Queue a, b Reporter: Nishan Shetty Assignee: Rohith Priority: Critical Fix For: 2.7.0 Attachments: 0001-YARN-2340.patch While job is in progress make Queue state as STOPPED and then restart RM Observe that standby RM fails to come up as acive throwing below NPE 2014-07-23 18:43:24,432 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED 2014-07-23 18:43:24,433 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602) at java.lang.Thread.run(Thread.java:662) 2014-07-23 18:43:24,434 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby
[ https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251451#comment-14251451 ] Hadoop QA commented on YARN-2340: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12687993/0001-YARN-2340.patch against trunk revision 1050d42. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 14 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRM Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6144//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6144//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6144//console This message is automatically generated. NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby -- Key: YARN-2340 URL: https://issues.apache.org/jira/browse/YARN-2340 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.4.1 Environment: Capacityscheduler with Queue a, b Reporter: Nishan Shetty Assignee: Rohith Priority: Critical Attachments: 0001-YARN-2340.patch While job is in progress make Queue state as STOPPED and then restart RM Observe that standby RM fails to come up as acive throwing below NPE 2014-07-23 18:43:24,432 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED 2014-07-23 18:43:24,433 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602) at java.lang.Thread.run(Thread.java:662) 2014-07-23 18:43:24,434 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby
[ https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251455#comment-14251455 ] Rohith commented on YARN-2340: -- It looks failed tests is random. In my env, it is running successfully. NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby -- Key: YARN-2340 URL: https://issues.apache.org/jira/browse/YARN-2340 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.4.1 Environment: Capacityscheduler with Queue a, b Reporter: Nishan Shetty Assignee: Rohith Priority: Critical Attachments: 0001-YARN-2340.patch While job is in progress make Queue state as STOPPED and then restart RM Observe that standby RM fails to come up as acive throwing below NPE 2014-07-23 18:43:24,432 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED 2014-07-23 18:43:24,433 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602) at java.lang.Thread.run(Thread.java:662) 2014-07-23 18:43:24,434 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby
[ https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246555#comment-14246555 ] Rohith commented on YARN-2340: -- Some thoughts for fixing this issue either of below 2 are 1. Straight away invoke KILL event for application if application is submitting into STOPPED queue during recovering applications. KILL event smothly transition RMApp/RMAppAttempt to KILLED state. But throw exception while killing master container since Either NM's were not registered to RM OR Connection Refused when NM is down. {{CS#addApplication}} {code} // Submit to the queue try { queue.submitApplication(applicationId, user, queueName); } catch (AccessControlException ace) { LOG.info(Failed to submit application + applicationId + to queue + queueName + from user + user, ace); if (isAppRecovering) { LOG.info(Killing the application + applicationId); this.rmContext.getDispatcher().getEventHandler() .handle(new RMAppEvent(applicationId, RMAppEventType.KILL)); } else { this.rmContext.getDispatcher().getEventHandler() .handle(new RMAppRejectedEvent(applicationId, ace.toString())); } return; } {code} {{CS#addApplicationAttempt}} {code} SchedulerApplicationFiCaSchedulerApp application = applications.get(applicationAttemptId.getApplicationId()); if (application == null isAttemptRecovering) { LOG.info(Attempt is recovering from an application where Queue is stopped. + applicationAttemptId); return; } {code} 2. Introduce new event type like APP_RECOVERY_FAILED or APP_SCHEDULER_RECOVERY_FAILED and trigger from Scheduler if app is submitted to stopped queue while recovering. Transitions would be like below AppAttempt : {{NEW to LAUNCHED}} App : {{NEW to ACCEPTED}} App : {{ACCEPTED to FINAL_SAVING}} on event APP_RECOVERY_FAILED or APP_SCHEDULER_RECOVERY_FAILED AppAttempt : {{LAUNCHED to FINAL_SAVING}} AppAttempt : {{FINAL_SAVING to FAILED}} App : {{FINAL_SAVING to FAILED}} Please give your suggestions/thoughts. NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby -- Key: YARN-2340 URL: https://issues.apache.org/jira/browse/YARN-2340 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.4.1 Environment: Capacityscheduler with Queue a, b Reporter: Nishan Shetty Assignee: Rohith Priority: Critical While job is in progress make Queue state as STOPPED and then restart RM Observe that standby RM fails to come up as acive throwing below NPE 2014-07-23 18:43:24,432 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED 2014-07-23 18:43:24,433 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602) at java.lang.Thread.run(Thread.java:662) 2014-07-23 18:43:24,434 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby
[ https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247415#comment-14247415 ] Jian He commented on YARN-2340: --- Today, the semantics to stop a queue is to let the existing applications run into completion. We should retain the same semantics for RM restart as well. In this case, I think we need to ignore this exception and continue because the application was accepted before the queue is changed to stopped. Similar problem could happen if we change the application acl and restart RM while application is running. NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby -- Key: YARN-2340 URL: https://issues.apache.org/jira/browse/YARN-2340 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.4.1 Environment: Capacityscheduler with Queue a, b Reporter: Nishan Shetty Assignee: Rohith Priority: Critical While job is in progress make Queue state as STOPPED and then restart RM Observe that standby RM fails to come up as acive throwing below NPE 2014-07-23 18:43:24,432 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED 2014-07-23 18:43:24,433 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602) at java.lang.Thread.run(Thread.java:662) 2014-07-23 18:43:24,434 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)