[
https://issues.apache.org/jira/browse/YARN-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17261872#comment-17261872
]
zhuqi commented on YARN-10504:
------------------------------
[~wangda] [~bteke]
1. The {{updateAbsoluteCapacitiesAndRelatedFields should update
maxApplications, but in some case, for example:}}
{{ in TestCapacitySchedulerAutoQueueCreation ->
}}testAutoCreatedQueueActivationDeactivation
{code:java}
//submit user_3 app. This cant be allocated since there is no capacity
// in NO_LABEL, SSD but can be in GPU label
submitApp(mockRM, parentQueue, USER3, USER3, 4, 1);
final CSQueue user3LeafQueue = cs.getQueue(USER3);
validateCapacities((AutoCreatedLeafQueue) user3LeafQueue, 0.0f, 0.0f,
1.0f, 1.0f);
validateCapacitiesByLabel((ManagedParentQueue) parentQueue,
(AutoCreatedLeafQueue)
user3LeafQueue, NODEL_LABEL_GPU);
{code}
The case is no capacity in user_3 autoCreatedLeafQueue, so in
{{updateAbsoluteCapacitiesAndRelatedFields}}
{code:java}
private void updateAbsoluteCapacitiesAndRelatedFields() {
updateAbsoluteCapacities();
CapacitySchedulerConfiguration schedulerConf = csContext.getConfiguration();
// If maxApplications not set, use the system total max app, apply newly
// calculated abs capacity of the queue.
if (maxApplications <= 0) {
int maxSystemApps = schedulerConf.
getMaximumSystemApplications();
maxApplications =
(int) (maxSystemApps * queueCapacities.getAbsoluteCapacity());
}
maxApplicationsPerUser = Math.min(maxApplications,
(int) (maxApplications * (usersManager.getUserLimit() / 100.0f)
* usersManager.getUserLimitFactor()));
}
// because capacities will update to 0
if (availableCapacity >= leafQueueTemplateCapacities
.getAbsoluteCapacity(nodeLabel)) {
updateCapacityFromTemplate(capacities, nodeLabel);
activate(leafQueue, nodeLabel);
} else{
updateToZeroCapacity(capacities, nodeLabel);
}
// And because, the update will be after reinitializeFromTemplate
final AutoCreatedLeafQueueConfig initialLeafQueueTemplate =
queueManagementPolicy.getInitialLeafQueueConfiguration(leafQueue);
leafQueue.reinitializeFromTemplate(initialLeafQueueTemplate);
// Do one update cluster resource call to make sure all absolute resources
// effective resources are updated.
updateClusterResource(this.csContext.getClusterResource(),
new ResourceLimits(this.csContext.getClusterResource()));{code}
The maxApplications and maxApplicationsPerUser will be 0.
So will should handle in new logic in
//TODO recalculate max applications because they can depend on capacity
The todo should be removed, just pass the AutoCreatedLeafQueue case now, or add
logic to make this case's maxApplications to a fixed default num.
2. As mentioned by [~bteke]
"Sharing my latest findings on TestAbsoluteResourceWithAutoQueue failure:
{{AutoCreatedLeafQueue#reinitializeFromTemplate }}was refactored, now the
getting and merging the QueueCapacities happens *before* calling the
{{ParentQueue#updateClusterResource}} (and
{{LeafQueue#updateClusterResource}}). In \{{LeafQueue#updateClusterResource
}}the \{{AbstractCSQueue#updateEffectiveResources }}is called where the
effectiveMinResource of the created queue is overridden with the template's
effectiveMinResources which is exactly the same the test is getting in the
asserts."
We should changed the {{LeafQueue updateClusterResource }}to:
{code:java}
// public void updateClusterResource(Resource clusterResource,
ResourceLimits currentResourceLimits) {
writeLock.lock();
try {
...
if (!(this instanceof AutoCreatedLeafQueue)) {
super.updateEffectiveResources(clusterResource);
}
}{code}
It will fix absolute case TestAbsoluteResourceWithAutoQueue .
If you any other advice?
Thanks.
> Implement weight mode in Capacity Scheduler
> -------------------------------------------
>
> Key: YARN-10504
> URL: https://issues.apache.org/jira/browse/YARN-10504
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Benjamin Teke
> Assignee: Benjamin Teke
> Priority: Major
> Attachments: YARN-10504.001.patch, YARN-10504.002.patch,
> YARN-10504.003.patch, YARN-10504.004.patch, YARN-10504.005.patch,
> YARN-10504.ver-1.patch, YARN-10504.ver-2.patch, YARN-10504.ver-3.patch
>
>
> To allow the possibility to flexibly create queues in Capacity Scheduler a
> weight mode should be introduced. The existing \{{capacity }}property should
> be used with a different syntax, i.e:
> root.users.capacity = (1.0) or ~1.0 or ^1.0 or @1.0
> root.users.capacity = 1.0w
> root.users.capacity = w:1.0
> Weight support should not impact the existing functionality.
>
> The new functionality should:
> * accept and validate the new weight values
> * enforce a singular mode on the whole queue tree
> * (re)calculate the relative (percentage-based) capacities based on the
> weights during launch and every time the queue structure changes
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]