[jira] [Comment Edited] (YARN-10589) Improve logic of multi-node allocation

zhuqi (Jira) Mon, 01 Feb 2021 07:01:14 -0800


    [ 
https://issues.apache.org/jira/browse/YARN-10589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17276369#comment-17276369
 ]


zhuqi edited comment on YARN-10589 at 2/1/21, 3:00 PM:
-------------------------------------------------------

Thanks for [~ztang] [~tanu.ajmera] review and the patch.

It make sense to me now.

I have reviewed the patch, it break the PRIORITY_SKIPPED logic.

In Schedule in priority order:
{code:java}
// Schedule in priority order
for (SchedulerRequestKey schedulerKey : application.getSchedulerKeys()) {
  ContainerAllocation result = allocate(clusterResource, candidates,
      schedulingMode, resourceLimits, schedulerKey, null);

  AllocationState allocationState = result.getAllocationState();
  if (allocationState == AllocationState.PRIORITY_SKIPPED) {
    continue;
  }
  return getCSAssignmentFromAllocateResult(clusterResource, result,
      null, node);
}


// will skip priority not meet
if (reservedContainer == null) {
  result = preCheckForNodeCandidateSet(clusterResource, node,
      schedulingMode, resourceLimits, schedulerKey);
  if (null != result) {
    continue;
  }
}
{code}
I think the original logic is right, in preCheckForNodeCandidateSet:
{code:java}
// Is the nodePartition of pending request matches the node's partition
// If not match, jump to next priority.
Optional<DiagnosticsCollector> dcOpt = activitiesManager == null ?
    Optional.empty() :
    activitiesManager.getOptionalDiagnosticsCollector();
if (!appInfo.precheckNode(schedulerKey, node, schedulingMode, dcOpt)) {
  ActivitiesLogger.APP.recordSkippedAppActivityWithoutAllocation(
      activitiesManager, node, application, schedulerKey,
      ActivityDiagnosticConstant.
          NODE_DO_NOT_MATCH_PARTITION_OR_PLACEMENT_CONSTRAINTS
          + ActivitiesManager.getDiagnostics(dcOpt),
      ActivityLevel.NODE);
  return ContainerAllocation.PRIORITY_SKIPPED;
}
{code}
We just skipped the priority of application. 

If we change to return ContainerAllocation.PARTITION_SKIPPED in patch, we 
should also skip this priority.

I change the code to :
{code:java}
// When the partition not meet the priority it will return
// AllocationState.PARTITION_SKIPPED, this should also skip.
if (allocationState == AllocationState.PRIORITY_SKIPPED
    || allocationState == AllocationState.PARTITION_SKIPPED) {
  continue;
}
{code}
Any other thoughts?

 

 


was (Author: zhuqi):
Thanks for [~ztang] [~tanu.ajmera] review and the patch.

It make sense to me now.

I have reviewed the patch, it break the PRIORITY_SKIPPED logic.

In Schedule in priority order:
{code:java}
// Schedule in priority order
for (SchedulerRequestKey schedulerKey : application.getSchedulerKeys()) {
  ContainerAllocation result = allocate(clusterResource, candidates,
      schedulingMode, resourceLimits, schedulerKey, null);

  AllocationState allocationState = result.getAllocationState();
  if (allocationState == AllocationState.PRIORITY_SKIPPED) {
    continue;
  }
  return getCSAssignmentFromAllocateResult(clusterResource, result,
      null, node);
}


// will skip priority not meet
if (reservedContainer == null) {
  result = preCheckForNodeCandidateSet(clusterResource, node,
      schedulingMode, resourceLimits, schedulerKey);
  if (null != result) {
    continue;
  }
}
{code}
I think the original logic is right, in preCheckForNodeCandidateSet:
{code:java}
// Is the nodePartition of pending request matches the node's partition
// If not match, jump to next priority.
Optional<DiagnosticsCollector> dcOpt = activitiesManager == null ?
    Optional.empty() :
    activitiesManager.getOptionalDiagnosticsCollector();
if (!appInfo.precheckNode(schedulerKey, node, schedulingMode, dcOpt)) {
  ActivitiesLogger.APP.recordSkippedAppActivityWithoutAllocation(
      activitiesManager, node, application, schedulerKey,
      ActivityDiagnosticConstant.
          NODE_DO_NOT_MATCH_PARTITION_OR_PLACEMENT_CONSTRAINTS
          + ActivitiesManager.getDiagnostics(dcOpt),
      ActivityLevel.NODE);
  return ContainerAllocation.PRIORITY_SKIPPED;
}
{code}
We have skipped the priority, and the next priority may need this node 
partition. We just skipped the priority of application. 

If we change to return ContainerAllocation.PARTITION_SKIPPED in patch, we 
should also skip this priority.

I change the code to :
{code:java}
// When the partition not meet the priority it will return
// AllocationState.PARTITION_SKIPPED, this should also skip.
if (allocationState == AllocationState.PRIORITY_SKIPPED
    || allocationState == AllocationState.PARTITION_SKIPPED) {
  continue;
}
{code}
Any other thoughts?

 

 

> Improve logic of multi-node allocation
> --------------------------------------
>
>                 Key: YARN-10589
>                 URL: https://issues.apache.org/jira/browse/YARN-10589
>             Project: Hadoop YARN
>          Issue Type: Task
>    Affects Versions: 3.3.0
>            Reporter: Tanu Ajmera
>            Assignee: Tanu Ajmera
>            Priority: Major
>         Attachments: YARN-10589-001.patch
>
>
> {code:java}
> for (String partititon : partitions) {
>  if (current++ > start) {
>  break;
>  }
>  CandidateNodeSet<FiCaSchedulerNode> candidates =
>  cs.getCandidateNodeSet(partititon);
>  if (candidates == null) {
>  continue;
>  }
>  cs.allocateContainersToNode(candidates, false);
> }{code}
> In above logic, if we have thousands of node in one partition, we will still 
> repeatedly access all nodes of the partition thousands of times. There is no 
> break point where if the partition is not same for the first node, it should 
> stop checking other nodes in that partition.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (YARN-10589) Improve logic of multi-node allocation

Reply via email to