Jonathan Hung created YARN-9992:
-----------------------------------
Summary: Max allocation per queue is zero for custom resource
types on RM startup
Key: YARN-9992
URL: https://issues.apache.org/jira/browse/YARN-9992
Project: Hadoop YARN
Issue Type: Bug
Reporter: Jonathan Hung
Found an issue where trying to request GPUs on a newly booted RM cannot
schedule. It throws the exception in
SchedulerUtils#throwInvalidResourceException:
{noformat}
throw new InvalidResourceRequestException(
"Invalid resource request, requested resource type=[" + reqResourceName
+ "] < 0 or greater than maximum allowed allocation. Requested "
+ "resource=" + reqResource + ", maximum allowed allocation="
+ availableResource
+ ", please note that maximum allowed allocation is calculated "
+ "by scheduler based on maximum resource of registered "
+ "NodeManagers, which might be less than configured "
+ "maximum allocation="
+ ResourceUtils.getResourceTypesMaximumAllocation());{noformat}
Upon refreshing scheduler (e.g. via refreshQueues), GPU scheduling works again.
I think the RC is that upon scheduler refresh, resource-types.xml is loaded in
CapacitySchedulerConfiguration (as part of YARN-7738), so when we call
ResourceUtils#fetchMaximumAllocationFromConfig in
CapacitySchedulerConfiguration#getMaximumAllocationPerQueue, it's able to fetch
the {{yarn.resource-types}} config. But resource-types.xml is not loaded into
the conf in CapacityScheduler#initScheduler, so it doesn't find the custom
resource when computing max allocations, and the custom resource max allocation
is 0.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]