[jira] [Commented] (YARN-10848) Vcore allocation problem with DefaultResourceCalculator

ASF GitHub Bot (Jira) Wed, 26 Nov 2025 16:35:14 -0800


    [ 
https://issues.apache.org/jira/browse/YARN-10848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18040928#comment-18040928
 ]


ASF GitHub Bot commented on YARN-10848:
---------------------------------------

github-actions[bot] commented on PR #3246:
URL: https://github.com/apache/hadoop/pull/3246#issuecomment-3583628895

   We're closing this stale PR because it has been open for 100 days with no 
activity. This isn't a judgement on the merit of the PR in any way. It's just a 
way of keeping the PR queue manageable.
   If you feel like this was a mistake, or you would like to continue working 
on it, please feel free to re-open it and ask for a committer to remove the 
stale tag and review again.
   Thanks all for your contribution.




> Vcore allocation problem with DefaultResourceCalculator
> -------------------------------------------------------
>
>                 Key: YARN-10848
>                 URL: https://issues.apache.org/jira/browse/YARN-10848
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler, capacityscheduler
>            Reporter: Peter Bacsko
>            Assignee: Minni Mittal
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: TestTooManyContainers.java
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> If we use DefaultResourceCalculator, then Capacity Scheduler keeps allocating 
> containers even if we run out of vcores.
> CS checks the the available resources at two places. The first check is 
> {{CapacityScheduler.allocateContainerOnSingleNode()}}:
> {noformat}
>     if (calculator.computeAvailableContainers(Resources
>             .add(node.getUnallocatedResource(), 
> node.getTotalKillableResources()),
>         minimumAllocation) <= 0) {
>       LOG.debug("This node " + node.getNodeID() + " doesn't have sufficient "
>           + "available or preemptible resource for minimum allocation");
> {noformat}
> The second, which is more important, is located in 
> {{RegularContainerAllocator.assignContainer()}}:
> {noformat}
>     if (!Resources.fitsIn(rc, capability, totalResource)) {
>       LOG.warn("Node : " + node.getNodeID()
>           + " does not have sufficient resource for ask : " + pendingAsk
>           + " node total capability : " + node.getTotalResource());
>       // Skip this locality request
>       ActivitiesLogger.APP.recordSkippedAppActivityWithoutAllocation(
>           activitiesManager, node, application, schedulerKey,
>           ActivityDiagnosticConstant.
>               NODE_TOTAL_RESOURCE_INSUFFICIENT_FOR_REQUEST
>               + getResourceDiagnostics(capability, totalResource),
>           ActivityLevel.NODE);
>       return ContainerAllocation.LOCALITY_SKIPPED;
>     }
> {noformat}
> Here, {{rc}} is the resource calculator instance, the other two values are:
> {noformat}
>     Resource capability = pendingAsk.getPerAllocationResource();
>     Resource available = node.getUnallocatedResource();
> {noformat}
> There is a repro unit test attatched to this case, which can demonstrate the 
> problem. The root cause is that we pass the resource calculator to 
> {{Resource.fitsIn()}}. Instead, we should use an overridden version, just 
> like in {{FSAppAttempt.assignContainer()}}:
> {noformat}
>    // Can we allocate a container on this node?
>     if (Resources.fitsIn(capability, available)) {
>       // Inform the application of the new container for this request
>       RMContainer allocatedContainer =
>           allocate(type, node, schedulerKey, pendingAsk,
>               reservedContainer);
> {noformat}
> In CS, if we switch to DominantResourceCalculator OR use 
> {{Resources.fitsIn()}} without the calculator in 
> {{RegularContainerAllocator.assignContainer()}}, that fixes the failing unit 
> test (see {{testTooManyContainers()}} in {{TestTooManyContainers.java}}).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (YARN-10848) Vcore allocation problem with DefaultResourceCalculator

Reply via email to