[ 
https://issues.apache.org/jira/browse/YARN-9992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16987105#comment-16987105
 ] 

Eric Payne commented on YARN-9992:
----------------------------------

The code changes look fine, but I'm still trying to understand what is 
different between trunk and branch-2. These code changes are not in trunk, but 
something is picking up the resource-types.xml in the CS init path.

> Max allocation per queue is zero for custom resource types on RM startup
> ------------------------------------------------------------------------
>
>                 Key: YARN-9992
>                 URL: https://issues.apache.org/jira/browse/YARN-9992
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Jonathan Hung
>            Assignee: Jonathan Hung
>            Priority: Major
>         Attachments: YARN-9992.001.patch
>
>
> Found an issue where trying to request GPUs on a newly booted RM cannot 
> schedule. It throws the exception in 
> SchedulerUtils#throwInvalidResourceException:
> {noformat}
> throw new InvalidResourceRequestException(
>     "Invalid resource request, requested resource type=[" + reqResourceName
>         + "] < 0 or greater than maximum allowed allocation. Requested "
>         + "resource=" + reqResource + ", maximum allowed allocation="
>         + availableResource
>         + ", please note that maximum allowed allocation is calculated "
>         + "by scheduler based on maximum resource of registered "
>         + "NodeManagers, which might be less than configured "
>         + "maximum allocation="
>         + ResourceUtils.getResourceTypesMaximumAllocation());{noformat}
> Upon refreshing scheduler (e.g. via refreshQueues), GPU scheduling works 
> again.
> I think the RC is that upon scheduler refresh, resource-types.xml is loaded 
> in CapacitySchedulerConfiguration (as part of YARN-7738), so when we call 
> ResourceUtils#fetchMaximumAllocationFromConfig in 
> CapacitySchedulerConfiguration#getMaximumAllocationPerQueue, it's able to 
> fetch the {{yarn.resource-types}} config. But resource-types.xml is not 
> loaded into the conf in CapacityScheduler#initScheduler, so it doesn't find 
> the custom resource when computing max allocations, and the custom resource 
> max allocation is 0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to