[jira] [Commented] (YARN-2823) NullPointerException in RM HA enabled 3-node cluster
[ https://issues.apache.org/jira/browse/YARN-2823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16683225#comment-16683225 ] Paul Lin commented on YARN-2823: [~imstefanlee] Hi, I'm facing the same issue with Flink applications. I tried explicitly setting `KeepContainersAcrossApplicationAttempts` to false, but it doesn't work. How do you solve the problem at last? And could you please point me to the code where the default value KeepContainersAcrossApplicationAttempts is set to true? Thanks a lot! > NullPointerException in RM HA enabled 3-node cluster > > > Key: YARN-2823 > URL: https://issues.apache.org/jira/browse/YARN-2823 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Gour Saha >Assignee: Jian He >Priority: Critical > Fix For: 2.6.0 > > Attachments: YARN-2823.1.patch, logs_with_NPE_in_RM.zip > > > Branch: > 2.6.0 > Environment: > A 3-node cluster with RM HA enabled. The HA setup went pretty smooth (used > Ambari) and then installed HBase using Slider. After some time the RMs went > down and would not come back up anymore. Following is the NPE we see in both > the RM logs. > {noformat} > 2014-09-16 01:36:28,037 FATAL resourcemanager.ResourceManager > (ResourceManager.java:run(612)) - Error in handling event type > APP_ATTEMPT_ADDED to the scheduler > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.transferStateFromPreviousAttempt(SchedulerApplicationAttempt.java:530) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:678) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1015) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:603) > at java.lang.Thread.run(Thread.java:744) > 2014-09-16 01:36:28,042 INFO resourcemanager.ResourceManager > (ResourceManager.java:run(616)) - Exiting, bbye.. > {noformat} > All the logs for this 3-node cluster has been uploaded. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-2823) NullPointerException in RM HA enabled 3-node cluster
[ https://issues.apache.org/jira/browse/YARN-2823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113767#comment-16113767 ] stefanlee commented on YARN-2823: - IMO, NPE happened when *transferStateFromPreviousAttempt* is *true* ,and the value of *transferStateFromPreviousAttempt* is depend on *KeepContainersAcrossApplicationAttempts* in *ApplicationSubmissionContext*, i have this NPE,because there is *FLINK* type application running in my cluster, then i saw the default value of *KeepContainersAcrossApplicationAttempts* in flink code is *true*. so, i want to know if *KeepContainersAcrossApplicationAttempts* is *false*, then this NPE can not happened?[~jianhe] thanks > NullPointerException in RM HA enabled 3-node cluster > > > Key: YARN-2823 > URL: https://issues.apache.org/jira/browse/YARN-2823 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Gour Saha >Assignee: Jian He >Priority: Critical > Fix For: 2.6.0 > > Attachments: logs_with_NPE_in_RM.zip, YARN-2823.1.patch > > > Branch: > 2.6.0 > Environment: > A 3-node cluster with RM HA enabled. The HA setup went pretty smooth (used > Ambari) and then installed HBase using Slider. After some time the RMs went > down and would not come back up anymore. Following is the NPE we see in both > the RM logs. > {noformat} > 2014-09-16 01:36:28,037 FATAL resourcemanager.ResourceManager > (ResourceManager.java:run(612)) - Error in handling event type > APP_ATTEMPT_ADDED to the scheduler > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.transferStateFromPreviousAttempt(SchedulerApplicationAttempt.java:530) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:678) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1015) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:603) > at java.lang.Thread.run(Thread.java:744) > 2014-09-16 01:36:28,042 INFO resourcemanager.ResourceManager > (ResourceManager.java:run(616)) - Exiting, bbye.. > {noformat} > All the logs for this 3-node cluster has been uploaded. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-2823) NullPointerException in RM HA enabled 3-node cluster
[ https://issues.apache.org/jira/browse/YARN-2823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14203389#comment-14203389 ] Hudson commented on YARN-2823: -- FAILURE: Integrated in Hadoop-Yarn-trunk #737 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/737/]) YARN-2823. Fixed ResourceManager app-attempt state machine to inform schedulers about previous finished attempts of a running appliation to avoid expectation mismatch w.r.t transferred containers. Contributed by Jian He. (vinodkv: rev a5657182a7accebe08cd86e46b4cdeb163d4d1f2) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java NullPointerException in RM HA enabled 3-node cluster Key: YARN-2823 URL: https://issues.apache.org/jira/browse/YARN-2823 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Gour Saha Assignee: Jian He Priority: Critical Fix For: 2.6.0 Attachments: YARN-2823.1.patch, logs_with_NPE_in_RM.zip Branch: 2.6.0 Environment: A 3-node cluster with RM HA enabled. The HA setup went pretty smooth (used Ambari) and then installed HBase using Slider. After some time the RMs went down and would not come back up anymore. Following is the NPE we see in both the RM logs. {noformat} 2014-09-16 01:36:28,037 FATAL resourcemanager.ResourceManager (ResourceManager.java:run(612)) - Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.transferStateFromPreviousAttempt(SchedulerApplicationAttempt.java:530) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:678) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1015) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:603) at java.lang.Thread.run(Thread.java:744) 2014-09-16 01:36:28,042 INFO resourcemanager.ResourceManager (ResourceManager.java:run(616)) - Exiting, bbye.. {noformat} All the logs for this 3-node cluster has been uploaded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2823) NullPointerException in RM HA enabled 3-node cluster
[ https://issues.apache.org/jira/browse/YARN-2823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14203434#comment-14203434 ] Hudson commented on YARN-2823: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1927 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1927/]) YARN-2823. Fixed ResourceManager app-attempt state machine to inform schedulers about previous finished attempts of a running appliation to avoid expectation mismatch w.r.t transferred containers. Contributed by Jian He. (vinodkv: rev a5657182a7accebe08cd86e46b4cdeb163d4d1f2) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java NullPointerException in RM HA enabled 3-node cluster Key: YARN-2823 URL: https://issues.apache.org/jira/browse/YARN-2823 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Gour Saha Assignee: Jian He Priority: Critical Fix For: 2.6.0 Attachments: YARN-2823.1.patch, logs_with_NPE_in_RM.zip Branch: 2.6.0 Environment: A 3-node cluster with RM HA enabled. The HA setup went pretty smooth (used Ambari) and then installed HBase using Slider. After some time the RMs went down and would not come back up anymore. Following is the NPE we see in both the RM logs. {noformat} 2014-09-16 01:36:28,037 FATAL resourcemanager.ResourceManager (ResourceManager.java:run(612)) - Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.transferStateFromPreviousAttempt(SchedulerApplicationAttempt.java:530) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:678) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1015) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:603) at java.lang.Thread.run(Thread.java:744) 2014-09-16 01:36:28,042 INFO resourcemanager.ResourceManager (ResourceManager.java:run(616)) - Exiting, bbye.. {noformat} All the logs for this 3-node cluster has been uploaded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2823) NullPointerException in RM HA enabled 3-node cluster
[ https://issues.apache.org/jira/browse/YARN-2823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14203459#comment-14203459 ] Hudson commented on YARN-2823: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1951 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1951/]) YARN-2823. Fixed ResourceManager app-attempt state machine to inform schedulers about previous finished attempts of a running appliation to avoid expectation mismatch w.r.t transferred containers. Contributed by Jian He. (vinodkv: rev a5657182a7accebe08cd86e46b4cdeb163d4d1f2) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java * hadoop-yarn-project/CHANGES.txt NullPointerException in RM HA enabled 3-node cluster Key: YARN-2823 URL: https://issues.apache.org/jira/browse/YARN-2823 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Gour Saha Assignee: Jian He Priority: Critical Fix For: 2.6.0 Attachments: YARN-2823.1.patch, logs_with_NPE_in_RM.zip Branch: 2.6.0 Environment: A 3-node cluster with RM HA enabled. The HA setup went pretty smooth (used Ambari) and then installed HBase using Slider. After some time the RMs went down and would not come back up anymore. Following is the NPE we see in both the RM logs. {noformat} 2014-09-16 01:36:28,037 FATAL resourcemanager.ResourceManager (ResourceManager.java:run(612)) - Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.transferStateFromPreviousAttempt(SchedulerApplicationAttempt.java:530) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:678) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1015) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:603) at java.lang.Thread.run(Thread.java:744) 2014-09-16 01:36:28,042 INFO resourcemanager.ResourceManager (ResourceManager.java:run(616)) - Exiting, bbye.. {noformat} All the logs for this 3-node cluster has been uploaded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2823) NullPointerException in RM HA enabled 3-node cluster
[ https://issues.apache.org/jira/browse/YARN-2823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202315#comment-14202315 ] Hudson commented on YARN-2823: -- FAILURE: Integrated in Hadoop-trunk-Commit #6479 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6479/]) YARN-2823. Fixed ResourceManager app-attempt state machine to inform schedulers about previous finished attempts of a running appliation to avoid expectation mismatch w.r.t transferred containers. Contributed by Jian He. (vinodkv: rev a5657182a7accebe08cd86e46b4cdeb163d4d1f2) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java NullPointerException in RM HA enabled 3-node cluster Key: YARN-2823 URL: https://issues.apache.org/jira/browse/YARN-2823 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Gour Saha Assignee: Jian He Priority: Critical Fix For: 2.6.0 Attachments: YARN-2823.1.patch, logs_with_NPE_in_RM.zip Branch: 2.6.0 Environment: A 3-node cluster with RM HA enabled. The HA setup went pretty smooth (used Ambari) and then installed HBase using Slider. After some time the RMs went down and would not come back up anymore. Following is the NPE we see in both the RM logs. {noformat} 2014-09-16 01:36:28,037 FATAL resourcemanager.ResourceManager (ResourceManager.java:run(612)) - Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.transferStateFromPreviousAttempt(SchedulerApplicationAttempt.java:530) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:678) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1015) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:603) at java.lang.Thread.run(Thread.java:744) 2014-09-16 01:36:28,042 INFO resourcemanager.ResourceManager (ResourceManager.java:run(616)) - Exiting, bbye.. {noformat} All the logs for this 3-node cluster has been uploaded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2823) NullPointerException in RM HA enabled 3-node cluster
[ https://issues.apache.org/jira/browse/YARN-2823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202322#comment-14202322 ] Vinod Kumar Vavilapalli commented on YARN-2823: --- bq. I think there is more that we can and should do but in the near future. In the non-restart control flow, AMs cannot register till the RM knows about the attempt (obviously), this condition is invalidated after restart. Will file a ticket. Filed YARN-2829. NullPointerException in RM HA enabled 3-node cluster Key: YARN-2823 URL: https://issues.apache.org/jira/browse/YARN-2823 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Gour Saha Assignee: Jian He Priority: Critical Fix For: 2.6.0 Attachments: YARN-2823.1.patch, logs_with_NPE_in_RM.zip Branch: 2.6.0 Environment: A 3-node cluster with RM HA enabled. The HA setup went pretty smooth (used Ambari) and then installed HBase using Slider. After some time the RMs went down and would not come back up anymore. Following is the NPE we see in both the RM logs. {noformat} 2014-09-16 01:36:28,037 FATAL resourcemanager.ResourceManager (ResourceManager.java:run(612)) - Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.transferStateFromPreviousAttempt(SchedulerApplicationAttempt.java:530) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:678) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1015) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:603) at java.lang.Thread.run(Thread.java:744) 2014-09-16 01:36:28,042 INFO resourcemanager.ResourceManager (ResourceManager.java:run(616)) - Exiting, bbye.. {noformat} All the logs for this 3-node cluster has been uploaded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2823) NullPointerException in RM HA enabled 3-node cluster
[ https://issues.apache.org/jira/browse/YARN-2823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14201040#comment-14201040 ] Jian He commented on YARN-2823: --- The problem is on recovery, if the previous attempt already finished, we are not adding it the scheduler. when scheduler tries to transferStateFromPreviousAttempt for work-presrving AM restart, it throws NPE. NullPointerException in RM HA enabled 3-node cluster Key: YARN-2823 URL: https://issues.apache.org/jira/browse/YARN-2823 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Gour Saha Assignee: Jian He Attachments: YARN-2823.1.patch, logs_with_NPE_in_RM.zip Branch: 2.6.0 Environment: A 3-node cluster with RM HA enabled. The HA setup went pretty smooth (used Ambari) and then installed HBase using Slider. After some time the RMs went down and would not come back up anymore. Following is the NPE we see in both the RM logs. {noformat} 2014-09-16 01:36:28,037 FATAL resourcemanager.ResourceManager (ResourceManager.java:run(612)) - Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.transferStateFromPreviousAttempt(SchedulerApplicationAttempt.java:530) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:678) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1015) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:603) at java.lang.Thread.run(Thread.java:744) 2014-09-16 01:36:28,042 INFO resourcemanager.ResourceManager (ResourceManager.java:run(616)) - Exiting, bbye.. {noformat} All the logs for this 3-node cluster has been uploaded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2823) NullPointerException in RM HA enabled 3-node cluster
[ https://issues.apache.org/jira/browse/YARN-2823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14201041#comment-14201041 ] Jian He commented on YARN-2823: --- Upload a patch to add the previously finished attempt to scheduler NullPointerException in RM HA enabled 3-node cluster Key: YARN-2823 URL: https://issues.apache.org/jira/browse/YARN-2823 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Gour Saha Assignee: Jian He Attachments: YARN-2823.1.patch, logs_with_NPE_in_RM.zip Branch: 2.6.0 Environment: A 3-node cluster with RM HA enabled. The HA setup went pretty smooth (used Ambari) and then installed HBase using Slider. After some time the RMs went down and would not come back up anymore. Following is the NPE we see in both the RM logs. {noformat} 2014-09-16 01:36:28,037 FATAL resourcemanager.ResourceManager (ResourceManager.java:run(612)) - Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.transferStateFromPreviousAttempt(SchedulerApplicationAttempt.java:530) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:678) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1015) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:603) at java.lang.Thread.run(Thread.java:744) 2014-09-16 01:36:28,042 INFO resourcemanager.ResourceManager (ResourceManager.java:run(616)) - Exiting, bbye.. {noformat} All the logs for this 3-node cluster has been uploaded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2823) NullPointerException in RM HA enabled 3-node cluster
[ https://issues.apache.org/jira/browse/YARN-2823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14201193#comment-14201193 ] Hadoop QA commented on YARN-2823: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12679967/YARN-2823.1.patch against trunk revision 75b820c. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5760//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5760//console This message is automatically generated. NullPointerException in RM HA enabled 3-node cluster Key: YARN-2823 URL: https://issues.apache.org/jira/browse/YARN-2823 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Gour Saha Assignee: Jian He Attachments: YARN-2823.1.patch, logs_with_NPE_in_RM.zip Branch: 2.6.0 Environment: A 3-node cluster with RM HA enabled. The HA setup went pretty smooth (used Ambari) and then installed HBase using Slider. After some time the RMs went down and would not come back up anymore. Following is the NPE we see in both the RM logs. {noformat} 2014-09-16 01:36:28,037 FATAL resourcemanager.ResourceManager (ResourceManager.java:run(612)) - Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.transferStateFromPreviousAttempt(SchedulerApplicationAttempt.java:530) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:678) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1015) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:603) at java.lang.Thread.run(Thread.java:744) 2014-09-16 01:36:28,042 INFO resourcemanager.ResourceManager (ResourceManager.java:run(616)) - Exiting, bbye.. {noformat} All the logs for this 3-node cluster has been uploaded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)