[ 
https://issues.apache.org/jira/browse/YARN-7643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suma Shivaprasad updated YARN-7643:
-----------------------------------
    Attachment: YARN-7643.3.patch

Thanks a lot [~sunilg] for the detailed review!
{quote} 1 here
  void replaceQueueFromPlacementContext(
      ApplicationPlacementContext placementContext,
      ApplicationSubmissionContext context) {
    // Set it to ApplicationSubmissionContext
    //apply queue mapping only to new application submissions
    if (placementContext != null && !StringUtils.equalsIgnoreCase(
        context.getQueue(), placementContext.getQueue())) {
      LOG.info("Placed application=" + context.getApplicationId() +
          " to queue=" + placementContext.getQueue() + ", original queue="
          + context
          .getQueue());
      context.setQueue(placementContext.getQueue());
    }
  }
Queue after placement is already updated in submission context during 
application submission. So while recovery, we already have the mapped queue 
name. Hence UserGroupMappingPlacementRule.getPlacementForApp will have correct 
mapped queue name, but still we redo same action. Ideally the current issue has 
happened because below event has to be fired from RMAppImpl to Scheduler and 
placementContext will be null in current case of recovery (this might break for 
normal user-mapping also?).
      app.scheduler.handle(
          new AppAddedSchedulerEvent(app.user, app.submissionContext, true,
              app.applicationPriority, app.placementContext));
Couple of suggestions:
a. Could we save placementContext under app data in statestore?
b. While recomputing placeApplication, could we bypass some api calls from 
PlacementManager as we already have the mapped queue name?
{quote}
Good catch [~sunilg] ! Have fixed the case where asc.queue = 'default' and 
added a test case to validate this.
a. Since queue mappings are available in configuration , we regenerate 
placement context on recovery instead of saving as we discussed.
b. This is a bigger change and may need a separate jira to add new APIs in 
PlacementRule
{quote}
2 Could we optimize addApplicationOnRecovery in CS further? Multiple if checks 
are a bit confusing. May be we can create getQueueWithMappings and instead of 
calling getQueue from addApplication/OnRecovery, we can getQueue and do mapping 
if needed. A bit if refactoring only.
{quote}
Fixed

> Handle recovery of applications on auto-created leaf queues
> -----------------------------------------------------------
>
>                 Key: YARN-7643
>                 URL: https://issues.apache.org/jira/browse/YARN-7643
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: capacity scheduler
>            Reporter: Suma Shivaprasad
>            Assignee: Suma Shivaprasad
>         Attachments: YARN-7643.1.patch, YARN-7643.2.patch, YARN-7643.3.patch
>
>
> CapacityScheduler application recovery should auto-create leaf queue if it 
> doesnt exist. Also RMAppManager needs to set the queue-mapping placement 
> context so that scheduler has necessary placement context to recreate the 
> queue



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to