[ https://issues.apache.org/jira/browse/YARN-9198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16743618#comment-16743618 ]
Dapeng Sun commented on YARN-9198: ---------------------------------- {quote} Not restoring an application is irreversible. There is no way to get that application back. If that would be an application that had been running for some time (like days) processing petabytes of data not restoring the application could be far more costly than some extra down time. {quote} Yes, in this scenario, we should not skip the error application. How about adding an config, the key likes "xxx.resourcemanager.fair-scheduler.skip-error-apps", so that users could choose from the behaviors: "Stoping RM and Recover the error App" or "Skip Error and Continue Starting RM". The option could be false by default, when meet the exception, the log would show the id(s) of error applications, user could make the decision to "fix" or "skip" base on the logs. > Corrupted state from a previous version can still cause RM to fail with NPE > on FairScheduler > -------------------------------------------------------------------------------------------- > > Key: YARN-9198 > URL: https://issues.apache.org/jira/browse/YARN-9198 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, resourcemanager > Affects Versions: 3.1.0, 2.8.5 > Reporter: Dapeng Sun > Assignee: Dapeng Sun > Priority: Major > Attachments: YARN-9198.001.patch > > > Previously, RM may fail with NPE due to YARN-4347,YARN-4000. After these > fixes, FairScheduler still has the same potential issue. > > 201x-xx-xx xx:xx:xx,xxx ERROR resourcemanager.ResourceManager > (ResourceManager.java:serviceStart) - Failed to load/recover state > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplicationAttempt(FairScheduler.java) -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org