[jira] [Updated] (YARN-2823) NullPointerException in RM HA enabled 3-node cluster

Vinod Kumar Vavilapalli (JIRA) Fri, 07 Nov 2014 09:27:19 -0800

     [ 
https://issues.apache.org/jira/browse/YARN-2823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Vinod Kumar Vavilapalli updated YARN-2823:
------------------------------------------
            Priority: Critical  (was: Major)
    Target Version/s: 2.6.0

Good catch. Marking this critical for 2.6 given it can crash RM.

The patch looks good to me, +1.

I think there is more that we can and should do but in the near future. In the 
non-restart control flow, AMs cannot register till the RM knows about the 
attempt (obviously), this condition is invalidated after restart. Will file a 
ticket.

Checking this in now..

> NullPointerException in RM HA enabled 3-node cluster
> ----------------------------------------------------
>
>                 Key: YARN-2823
>                 URL: https://issues.apache.org/jira/browse/YARN-2823
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.6.0
>            Reporter: Gour Saha
>            Assignee: Jian He
>            Priority: Critical
>         Attachments: YARN-2823.1.patch, logs_with_NPE_in_RM.zip
>
>
> Branch:
> 2.6.0
> Environment: 
> A 3-node cluster with RM HA enabled. The HA setup went pretty smooth (used 
> Ambari) and then installed HBase using Slider. After some time the RMs went 
> down and would not come back up anymore. Following is the NPE we see in both 
> the RM logs.
> {noformat}
> 2014-09-16 01:36:28,037 FATAL resourcemanager.ResourceManager 
> (ResourceManager.java:run(612)) - Error in handling event type 
> APP_ATTEMPT_ADDED to the scheduler
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.transferStateFromPreviousAttempt(SchedulerApplicationAttempt.java:530)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:678)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1015)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:603)
>         at java.lang.Thread.run(Thread.java:744)
> 2014-09-16 01:36:28,042 INFO  resourcemanager.ResourceManager 
> (ResourceManager.java:run(616)) - Exiting, bbye..
> {noformat}
> All the logs for this 3-node cluster has been uploaded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2823) NullPointerException in RM HA enabled 3-node cluster

Reply via email to