[jira] [Commented] (YARN-2823) NullPointerException in RM HA enabled 3-node cluster

stefanlee (JIRA) Thu, 03 Aug 2017 18:25:22 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-2823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16113767#comment-16113767
 ]


stefanlee commented on YARN-2823:
---------------------------------

IMO, NPE  happened when *transferStateFromPreviousAttempt*  is *true* ,and  the 
value of *transferStateFromPreviousAttempt*  is depend on 
*KeepContainersAcrossApplicationAttempts* in *ApplicationSubmissionContext*, i 
have this NPE,because there is *FLINK* type application running in my cluster, 
then i saw the default value of *KeepContainersAcrossApplicationAttempts* in 
flink code is *true*. so, i want to know if 
*KeepContainersAcrossApplicationAttempts* is *false*, then this NPE can not 
happened?[~jianhe] thanks

> NullPointerException in RM HA enabled 3-node cluster
> ----------------------------------------------------
>
>                 Key: YARN-2823
>                 URL: https://issues.apache.org/jira/browse/YARN-2823
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.6.0
>            Reporter: Gour Saha
>            Assignee: Jian He
>            Priority: Critical
>             Fix For: 2.6.0
>
>         Attachments: logs_with_NPE_in_RM.zip, YARN-2823.1.patch
>
>
> Branch:
> 2.6.0
> Environment: 
> A 3-node cluster with RM HA enabled. The HA setup went pretty smooth (used 
> Ambari) and then installed HBase using Slider. After some time the RMs went 
> down and would not come back up anymore. Following is the NPE we see in both 
> the RM logs.
> {noformat}
> 2014-09-16 01:36:28,037 FATAL resourcemanager.ResourceManager 
> (ResourceManager.java:run(612)) - Error in handling event type 
> APP_ATTEMPT_ADDED to the scheduler
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.transferStateFromPreviousAttempt(SchedulerApplicationAttempt.java:530)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:678)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1015)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:603)
>         at java.lang.Thread.run(Thread.java:744)
> 2014-09-16 01:36:28,042 INFO  resourcemanager.ResourceManager 
> (ResourceManager.java:run(616)) - Exiting, bbye..
> {noformat}
> All the logs for this 3-node cluster has been uploaded.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (YARN-2823) NullPointerException in RM HA enabled 3-node cluster

Reply via email to