[
https://issues.apache.org/jira/browse/YARN-10589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17276369#comment-17276369
]
zhuqi edited comment on YARN-10589 at 2/1/21, 3:00 PM:
-------------------------------------------------------
Thanks for [~ztang] [~tanu.ajmera] review and the patch.
It make sense to me now.
I have reviewed the patch, it break the PRIORITY_SKIPPED logic.
In Schedule in priority order:
{code:java}
// Schedule in priority order
for (SchedulerRequestKey schedulerKey : application.getSchedulerKeys()) {
ContainerAllocation result = allocate(clusterResource, candidates,
schedulingMode, resourceLimits, schedulerKey, null);
AllocationState allocationState = result.getAllocationState();
if (allocationState == AllocationState.PRIORITY_SKIPPED) {
continue;
}
return getCSAssignmentFromAllocateResult(clusterResource, result,
null, node);
}
// will skip priority not meet
if (reservedContainer == null) {
result = preCheckForNodeCandidateSet(clusterResource, node,
schedulingMode, resourceLimits, schedulerKey);
if (null != result) {
continue;
}
}
{code}
I think the original logic is right, in preCheckForNodeCandidateSet:
{code:java}
// Is the nodePartition of pending request matches the node's partition
// If not match, jump to next priority.
Optional<DiagnosticsCollector> dcOpt = activitiesManager == null ?
Optional.empty() :
activitiesManager.getOptionalDiagnosticsCollector();
if (!appInfo.precheckNode(schedulerKey, node, schedulingMode, dcOpt)) {
ActivitiesLogger.APP.recordSkippedAppActivityWithoutAllocation(
activitiesManager, node, application, schedulerKey,
ActivityDiagnosticConstant.
NODE_DO_NOT_MATCH_PARTITION_OR_PLACEMENT_CONSTRAINTS
+ ActivitiesManager.getDiagnostics(dcOpt),
ActivityLevel.NODE);
return ContainerAllocation.PRIORITY_SKIPPED;
}
{code}
We just skipped the priority of application.
If we change to return ContainerAllocation.PARTITION_SKIPPED in patch, we
should also skip this priority.
I change the code to :
{code:java}
// When the partition not meet the priority it will return
// AllocationState.PARTITION_SKIPPED, this should also skip.
if (allocationState == AllocationState.PRIORITY_SKIPPED
|| allocationState == AllocationState.PARTITION_SKIPPED) {
continue;
}
{code}
Any other thoughts?
was (Author: zhuqi):
Thanks for [~ztang] [~tanu.ajmera] review and the patch.
It make sense to me now.
I have reviewed the patch, it break the PRIORITY_SKIPPED logic.
In Schedule in priority order:
{code:java}
// Schedule in priority order
for (SchedulerRequestKey schedulerKey : application.getSchedulerKeys()) {
ContainerAllocation result = allocate(clusterResource, candidates,
schedulingMode, resourceLimits, schedulerKey, null);
AllocationState allocationState = result.getAllocationState();
if (allocationState == AllocationState.PRIORITY_SKIPPED) {
continue;
}
return getCSAssignmentFromAllocateResult(clusterResource, result,
null, node);
}
// will skip priority not meet
if (reservedContainer == null) {
result = preCheckForNodeCandidateSet(clusterResource, node,
schedulingMode, resourceLimits, schedulerKey);
if (null != result) {
continue;
}
}
{code}
I think the original logic is right, in preCheckForNodeCandidateSet:
{code:java}
// Is the nodePartition of pending request matches the node's partition
// If not match, jump to next priority.
Optional<DiagnosticsCollector> dcOpt = activitiesManager == null ?
Optional.empty() :
activitiesManager.getOptionalDiagnosticsCollector();
if (!appInfo.precheckNode(schedulerKey, node, schedulingMode, dcOpt)) {
ActivitiesLogger.APP.recordSkippedAppActivityWithoutAllocation(
activitiesManager, node, application, schedulerKey,
ActivityDiagnosticConstant.
NODE_DO_NOT_MATCH_PARTITION_OR_PLACEMENT_CONSTRAINTS
+ ActivitiesManager.getDiagnostics(dcOpt),
ActivityLevel.NODE);
return ContainerAllocation.PRIORITY_SKIPPED;
}
{code}
We have skipped the priority, and the next priority may need this node
partition. We just skipped the priority of application.
If we change to return ContainerAllocation.PARTITION_SKIPPED in patch, we
should also skip this priority.
I change the code to :
{code:java}
// When the partition not meet the priority it will return
// AllocationState.PARTITION_SKIPPED, this should also skip.
if (allocationState == AllocationState.PRIORITY_SKIPPED
|| allocationState == AllocationState.PARTITION_SKIPPED) {
continue;
}
{code}
Any other thoughts?
> Improve logic of multi-node allocation
> --------------------------------------
>
> Key: YARN-10589
> URL: https://issues.apache.org/jira/browse/YARN-10589
> Project: Hadoop YARN
> Issue Type: Task
> Affects Versions: 3.3.0
> Reporter: Tanu Ajmera
> Assignee: Tanu Ajmera
> Priority: Major
> Attachments: YARN-10589-001.patch
>
>
> {code:java}
> for (String partititon : partitions) {
> if (current++ > start) {
> break;
> }
> CandidateNodeSet<FiCaSchedulerNode> candidates =
> cs.getCandidateNodeSet(partititon);
> if (candidates == null) {
> continue;
> }
> cs.allocateContainersToNode(candidates, false);
> }{code}
> In above logic, if we have thousands of node in one partition, we will still
> repeatedly access all nodes of the partition thousands of times. There is no
> break point where if the partition is not same for the first node, it should
> stop checking other nodes in that partition.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]