[
https://issues.apache.org/jira/browse/YARN-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15783397#comment-15783397
]
Bibin A Chundatt edited comment on YARN-6031 at 12/28/16 6:08 PM:
------------------------------------------------------------------
As [~sunilg] mentioned earlier ignoring application could create stale
application in state store.
[~Ying Zhang] IIUC ignore validation on recovery also should work.
{code}
private static void validateResourceRequest(ResourceRequest resReq,
Resource maximumResource, QueueInfo queueInfo, RMContext rmContext)
throws InvalidResourceRequestException {
Configuration conf = rmContext.getYarnConfiguration();
// If Node label is not enabled throw exception
if (null != conf && !YarnConfiguration.areNodeLabelsEnabled(conf)) {
String labelExp = resReq.getNodeLabelExpression();
if (!(RMNodeLabelsManager.NO_LABEL.equals(labelExp)
|| null == labelExp)) {
throw new InvalidLabelResourceRequestException(
"Invalid resource request, node label not enabled "
+ "but request contains label expression");
}
}
{code}
Thoughts??
{quote}
The current fact is (with or without this fix): application submitted with node
label expression explicitly specified will fail during recovery
{quote}
IMHO should be acceptable since any application submitted with labels when
feature is disabled gets rejected.
Solution 2:
We could ignore/reset labels to default in resourcerequest when nodelabels
are disabled. Havn't looked at impact of the same.
An elaborate testing would be needed to see how metrics are impacted.
Disadvantage is client will never get to know that reset happened in RM side
YARN-4562 will try to handle ignore loading label configuration when disabled.
[~templedf] i do agree that admin would require some way to dump application
info when recovery fails so that bulk update in state store is possible.
was (Author: bibinchundatt):
As [~sunilg] mentioned earlier ignoring application could create stale
application in state store.
[~Ying Zhang] IIUC ignore validation on recovery also should work.
{code}
private static void validateResourceRequest(ResourceRequest resReq,
Resource maximumResource, QueueInfo queueInfo, RMContext rmContext)
throws InvalidResourceRequestException {
Configuration conf = rmContext.getYarnConfiguration();
// If Node label is not enabled throw exception
if (null != conf && !YarnConfiguration.areNodeLabelsEnabled(conf)) {
String labelExp = resReq.getNodeLabelExpression();
if (!(RMNodeLabelsManager.NO_LABEL.equals(labelExp)
|| null == labelExp)) {
throw new InvalidLabelResourceRequestException(
"Invalid resource request, node label not enabled "
+ "but request contains label expression");
}
}
{code}
Thoughts??
{quote}
The current fact is (with or without this fix): application submitted with node
label expression explicitly specified will fail during recovery
{quote}
IMHO should be acceptable since any application submitted with labels when
feature is disabled gets rejected.
Solution 2:
We could ignore/reset labels to default in resourcerequest when nodelabels
are disabled. Havn't looked at impact of the same.
An elaborate testing would be needed to see how metrics are impacted.
YARN-4562 will try to handle ignore loading label configuration when disabled.
[~templedf] i do agree that admin would require some way to dump application
info when recovery fails so that bulk update in state store is possible.
> Application recovery failed after disabling node label
> ------------------------------------------------------
>
> Key: YARN-6031
> URL: https://issues.apache.org/jira/browse/YARN-6031
> Project: Hadoop YARN
> Issue Type: Bug
> Components: scheduler
> Affects Versions: 2.8.0
> Reporter: Ying Zhang
> Assignee: Ying Zhang
> Priority: Minor
> Attachments: YARN-6031.001.patch
>
>
> Here is the repro steps:
> Enable node label, restart RM, configure CS properly, and run some jobs;
> Disable node label, restart RM, and the following exception thrown:
> {noformat}
> Caused by:
> org.apache.hadoop.yarn.exceptions.InvalidLabelResourceRequestException:
> Invalid resource request, node label not enabled but request contains label
> expression
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:225)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:248)
> at
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:394)
> at
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:339)
> at
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:319)
> at
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:436)
> at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1165)
> at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
> at
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> ... 10 more
> {noformat}
> During RM restart, application recovery failed due to that application had
> node label expression specified while node label has been disabled.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]