[ 
https://issues.apache.org/jira/browse/YARN-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857371#comment-15857371
 ] 

Ying Zhang commented on YARN-6031:
----------------------------------

Hi [~sunilg], sorry for the late reply (was out for the Spring Festival 
holiday). Here is the patch for branch-2.8, please have a look.
I've found a problem with the test case when making the patch for branch-2.8. 
TestRMRestart runs all test cases for CapacityScheduler and FairScheduler 
respectively, and this test case can only run successfully for 
CapacityScheduler since it involves running application with node label 
specified. On trunk, we don't see this problem because due to YARN-4805, 
TestRMRestart now only runs for CapacityScheduler. I've modified the test case 
a little bit to just run when it is CapacityScheduler.
{code}
  public void testRMRestartAfterNodeLabelDisabled() throws Exception {
    // Skip this test case if it is not CapacityScheduler since NodeLabel is
    // not fully supported yet for FairScheduler and others.
    if (!getSchedulerType().equals(SchedulerType.CAPACITY)) {
      return;
    }
...
{code}
We should probably make this change to trunk too. Let me know you want to make 
the change through this JIRA, or I need to open another JIRA to address it?

> Application recovery has failed when node label feature is turned off during 
> RM recovery
> ----------------------------------------------------------------------------------------
>
>                 Key: YARN-6031
>                 URL: https://issues.apache.org/jira/browse/YARN-6031
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: scheduler
>    Affects Versions: 2.8.0
>            Reporter: Ying Zhang
>            Assignee: Ying Zhang
>            Priority: Minor
>         Attachments: YARN-6031.001.patch, YARN-6031.002.patch, 
> YARN-6031.003.patch, YARN-6031.004.patch, YARN-6031.005.patch, 
> YARN-6031.006.patch, YARN-6031.007.patch
>
>
> Here is the repro steps:
> Enable node label, restart RM, configure CS properly, and run some jobs;
> Disable node label, restart RM, and the following exception thrown:
> {noformat}
> Caused by: 
> org.apache.hadoop.yarn.exceptions.InvalidLabelResourceRequestException: 
> Invalid resource request, node label not enabled but request contains label 
> expression
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:225)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:248)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:394)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:339)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:319)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:436)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1165)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
>         at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>         ... 10 more
> {noformat}
> During RM restart, application recovery failed due to that application had 
> node label expression specified while node label has been disabled.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to