[
https://issues.apache.org/jira/browse/YARN-9730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16904280#comment-16904280
]
Jonathan Hung commented on YARN-9730:
-------------------------------------
Attached 001 patch. 1a and 1b are handled in
{{SchedulerUtils#enforcePartitionExclusivity}}. 2 is handled in
{{FifoOrderingPolicyWithExclusivePartitions}}.
Configurations to enable:
* {{yarn-site.xml}}
{noformat}
<property>
<name>yarn.node-labels.exclusive-enforced-partitions</name>
<value>P</value>
</property>{noformat}
* {{capacity-scheduler.xml}}
{noformat}
<property>
<name>yarn.scheduler.capacity.<queue-path>.ordering-policy</name>
<value>fifo-with-partitions</value>
</property>
<property>
<name>yarn.scheduler.capacity.<queue-path>.ordering-policy.exclusive-enforced-partitions</name>
<value>P</value>
</property>{noformat}
> Support forcing configured partitions to be exclusive based on app node label
> -----------------------------------------------------------------------------
>
> Key: YARN-9730
> URL: https://issues.apache.org/jira/browse/YARN-9730
> Project: Hadoop YARN
> Issue Type: Task
> Reporter: Jonathan Hung
> Assignee: Jonathan Hung
> Priority: Major
> Attachments: YARN-9730.001.patch
>
>
> Use case: queue X has all of its workload in non-default (exclusive)
> partition P (by setting app submission context's node label set to P). Node
> in partition Q != P heartbeats to RM. Capacity scheduler loops through every
> application in X, and every scheduler key in this application, and fails to
> allocate each time since the app's requested label and the node's label don't
> match. This causes huge performance degradation when number of apps in X is
> large.
> To fix the issue, allow RM to configure partitions as "forced-exclusive". If
> partition P is "forced-exclusive", then:
> * 1a. If app sets its submission context's node label to P, all its resource
> requests will be overridden to P
> * 1b. If app sets its submission context's node label Q, any of its resource
> requests whose labels are P will be overridden to Q
> * 2. In the scheduler, we add apps with node label expression P to a
> separate data structure. When a node in partition P heartbeats to scheduler,
> we only try to schedule apps in this data structure. When a node in partition
> Q heartbeats to scheduler, we schedule the rest of the apps as normal.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]