[
https://issues.apache.org/jira/browse/YARN-10589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17276973#comment-17276973
]
Zhankun Tang edited comment on YARN-10589 at 2/2/21, 10:02 AM:
---------------------------------------------------------------
[~zhuqi], Thanks a lot for the review!
[~tanu.ajmera], are we sure that this PARTITION_SKIPPED only represents the
partition mismatch? It seems the reason could be placement rule mismatch too.
See "NODE_DO_NOT_MATCH_PARTITION_OR_PLACEMENT_CONSTRAINTS"
{code:java}
if (!appInfo.precheckNode(schedulerKey, node, schedulingMode, dcOpt)) {
ActivitiesLogger.APP.recordSkippedAppActivityWithoutAllocation(
activitiesManager, node, application, schedulerKey,
ActivityDiagnosticConstant.
NODE_DO_NOT_MATCH_PARTITION_OR_PLACEMENT_CONSTRAINTS
+ ActivitiesManager.getDiagnostics(dcOpt),
ActivityLevel.NODE);
return ContainerAllocation.PARTITION_SKIPPED;
}
{code}
was (Author: ztang):
[~zhuqi], Thanks a lot for the review!
[~tanu.ajmera], I'm not very clear what we are doing now. When we change
PRIORITY_SKIPPED to PARTITION_SKIPPED, what's the difference if we use
PRIORITY_SKIPPED to skip the node iteration?
> Improve logic of multi-node allocation
> --------------------------------------
>
> Key: YARN-10589
> URL: https://issues.apache.org/jira/browse/YARN-10589
> Project: Hadoop YARN
> Issue Type: Task
> Affects Versions: 3.3.0
> Reporter: Tanu Ajmera
> Assignee: Tanu Ajmera
> Priority: Major
> Attachments: YARN-10589-001.patch, YARN-10589-002.patch,
> YARN-10589-003.patch
>
>
> {code:java}
> for (String partititon : partitions) {
> if (current++ > start) {
> break;
> }
> CandidateNodeSet<FiCaSchedulerNode> candidates =
> cs.getCandidateNodeSet(partititon);
> if (candidates == null) {
> continue;
> }
> cs.allocateContainersToNode(candidates, false);
> }{code}
> In above logic, if we have thousands of node in one partition, we will still
> repeatedly access all nodes of the partition thousands of times. There is no
> break point where if the partition is not same for the first node, it should
> stop checking other nodes in that partition.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]