[ 
https://issues.apache.org/jira/browse/YARN-9198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16744766#comment-16744766
 ] 

Kai Zheng commented on YARN-9198:
---------------------------------

While it's surely desirable to root and fix the underlying cause (this isn't 
always possible though), it's also worthwhile to have the check so that the 
scheduler and RM can recover sooner instead of being blocked by that.

+1 for the patch. Would anybody take an additional look? Thanks.

> Corrupted state from a previous version can still cause RM to fail with NPE 
> on FairScheduler
> --------------------------------------------------------------------------------------------
>
>                 Key: YARN-9198
>                 URL: https://issues.apache.org/jira/browse/YARN-9198
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: fairscheduler, resourcemanager
>    Affects Versions: 3.1.0, 2.8.5
>            Reporter: Dapeng Sun
>            Assignee: Dapeng Sun
>            Priority: Major
>         Attachments: YARN-9198.001.patch
>
>
> Previously, RM may fail with NPE due to YARN-4347,YARN-4000. After these 
> fixes, FairScheduler still has the same potential issue.
>  
> 201x-xx-xx xx:xx:xx,xxx ERROR resourcemanager.ResourceManager 
> (ResourceManager.java:serviceStart) - Failed to load/recover state
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplicationAttempt(FairScheduler.java)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to