[
https://issues.apache.org/jira/browse/YARN-9569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854763#comment-16854763
]
Suma Shivaprasad commented on YARN-9569:
----------------------------------------
+1. Patch LGTM
> Auto-created leaf queues do not honor cluster-wide min/max memory/vcores
> ------------------------------------------------------------------------
>
> Key: YARN-9569
> URL: https://issues.apache.org/jira/browse/YARN-9569
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: capacity scheduler
> Affects Versions: 3.2.0
> Reporter: Craig Condit
> Assignee: Craig Condit
> Priority: Major
> Attachments: YARN-9569.001.patch, YARN-9569.002.patch
>
>
> Auto-created leaf queues do not honor cluster-wide settings for maximum
> CPU/vcores allocation.
> To reproduce:
> # Set auto-create-child-queue.enabled=true for a parent queue.
> # Set leaf-queue-template.maximum-allocation-mb=16384.
> # Set yarn.resource-types.memory-mb.maximum-allocation=16384 in
> resource-types.xml
> # Launch a YARN app with a container requesting 16 GB RAM.
>
> This scenario should work, but instead you get an error similar to this:
> {code:java}
> java.lang.IllegalArgumentException: Queue maximum allocation cannot be larger
> than the cluster setting for queue root.auto.test max allocation per queue:
> <memory:16384, vCores:1> cluster setting: <memory:8192, vCores:4> {code}
>
> This seems to be caused by this code in
> ManagedParentQueue.getLeafQueueConfigs:
> {code:java}
> CapacitySchedulerConfiguration leafQueueConfigTemplate = new
> CapacitySchedulerConfiguration(new Configuration(false), false);{code}
>
> This initializes a new leaf queue configuration that does not read
> resource-types.xml (or any other config). Later, this
> CapacitySchedulerConfiguration instance calls
> ResourceUtils.fetchMaximumAllocationFromConfig() from its
> getMaximumAllocationPerQueue() method and passes itself as the configuration
> to use. Since the resource types are not present, ResourceUtils falls back to
> compiled-in defaults of 8GB RAM, 4 cores.
>
> I was able to work around this with a custom AutoCreatedQueueManagementPolicy
> implementation which does something like this in init() and reinitialize():
> {code:java}
> for (Map.Entry<String, String> entry : this.scheduler.getConfiguration()) {
> if (entry.getKey().startsWith("yarn.resource-types")) {
> parentQueue.getLeafQueueTemplate().getLeafQueueConfigs()
> .set(entry.getKey(), entry.getValue());
> }
> }
> {code}
> However, this is obviously a very hacky way to solve the problem.
> I can submit a proper patch if someone can provide some direction as to the
> best way to proceed.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]