[
https://issues.apache.org/jira/browse/YARN-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15785845#comment-15785845
]
Wangda Tan commented on YARN-6031:
----------------------------------
Thanks [~Ying Zhang] for updating patch, and thanks comments from [~templedf] /
[~sunilg].
I agree with approach suggested by [~sunilg]. We can create RMAppImpl when
InvalidResourceRequest thrown (For example, create a null ResourceRequest), and
print proper error message
After {{createAndPopulateNewRMApp}}, we can check if (amRequest == null) and
(not unmanaged-am), if yes, send APP_REJECTED message, just like handling
errors when credential parse error found, see {{submitApplication}}.
Thoughts?
> Application recovery failed after disabling node label
> ------------------------------------------------------
>
> Key: YARN-6031
> URL: https://issues.apache.org/jira/browse/YARN-6031
> Project: Hadoop YARN
> Issue Type: Bug
> Components: scheduler
> Affects Versions: 2.8.0
> Reporter: Ying Zhang
> Assignee: Ying Zhang
> Priority: Minor
> Attachments: YARN-6031.001.patch
>
>
> Here is the repro steps:
> Enable node label, restart RM, configure CS properly, and run some jobs;
> Disable node label, restart RM, and the following exception thrown:
> {noformat}
> Caused by:
> org.apache.hadoop.yarn.exceptions.InvalidLabelResourceRequestException:
> Invalid resource request, node label not enabled but request contains label
> expression
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:225)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:248)
> at
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:394)
> at
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:339)
> at
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:319)
> at
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:436)
> at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1165)
> at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
> at
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> ... 10 more
> {noformat}
> During RM restart, application recovery failed due to that application had
> node label expression specified while node label has been disabled.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]