[
https://issues.apache.org/jira/browse/YARN-8476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
YunFan Zhou updated YARN-8476:
------------------------------
Priority: Minor (was: Blocker)
> Should check the resource of assignment is greater than Resources.none()
> before submitResourceCommitRequest
> -----------------------------------------------------------------------------------------------------------
>
> Key: YARN-8476
> URL: https://issues.apache.org/jira/browse/YARN-8476
> Project: Hadoop YARN
> Issue Type: Bug
> Components: capacity scheduler, capacityscheduler
> Reporter: YunFan Zhou
> Assignee: YunFan Zhou
> Priority: Minor
>
> Hi, [~leftnoteasy]
> We recently merge https://issues.apache.org/jira/browse/YARN-5139 into our
> version and found some bug.
> Below is the more serious bugs I've encountered:
>
> {code:java}
> LeafQueue queue = ((LeafQueue) reservedApplication.getQueue());
> assignment = queue.assignContainers(getClusterResource(), candidates,
> // TODO, now we only consider limits for parent for non-labeled
> // resources, should consider labeled resources as well.
> new ResourceLimits(labelManager
> .getResourceByLabel(RMNodeLabelsManager.NO_LABEL,
> getClusterResource())),
> SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY);
> if (assignment.isFulfilledReservation()) {
> if (withNodeHeartbeat) {
> // Only update SchedulerHealth in sync scheduling, existing
> // Data structure of SchedulerHealth need to be updated for
> // Async mode
> updateSchedulerHealth(lastNodeUpdateTime, node.getNodeID(),
> assignment);
> }
> schedulerHealth.updateSchedulerFulfilledReservationCounts(1);
> ActivitiesLogger.QUEUE.recordQueueActivity(activitiesManager, node,
> queue.getParent().getQueueName(), queue.getQueueName(),
> ActivityState.ACCEPTED, ActivityDiagnosticConstant.EMPTY);
> ActivitiesLogger.NODE.finishAllocatedNodeAllocation(activitiesManager,
> node, reservedContainer.getContainerId(),
> AllocationState.ALLOCATED_FROM_RESERVED);
> } else{
> ActivitiesLogger.QUEUE.recordQueueActivity(activitiesManager, node,
> queue.getParent().getQueueName(), queue.getQueueName(),
> ActivityState.ACCEPTED, ActivityDiagnosticConstant.EMPTY);
> ActivitiesLogger.NODE.finishAllocatedNodeAllocation(activitiesManager,
> node, reservedContainer.getContainerId(), AllocationState.SKIPPED);
> }
> assignment.setSchedulingMode(
> SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY);
> submitResourceCommitRequest(getClusterResource(), assignment);
> }
> {code}
>
> Before we submit assignment to *resourceCommitterService* service, we must
> check the assignment is greater than the *Resources. none().*
> Because the assignment can be *CSAssignment(Resources.createResource(0, 0),
> NodeType.NODE_LOCAL)* after call *getRootQueue().assignContainers* method,
> which is a meaningless value.
>
> But we are still going to submit it to *resourceCommitterService* service,
> and lead to a bunch of meaningless assignments blocks other meaningful event
> processing.
>
> I think this is a very serious bug! Any Suggestions?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]