[ 
https://issues.apache.org/jira/browse/YARN-8476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

YunFan Zhou updated YARN-8476:
------------------------------
    Priority: Minor  (was: Blocker)

> Should check the resource of assignment is greater than Resources.none() 
> before submitResourceCommitRequest
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-8476
>                 URL: https://issues.apache.org/jira/browse/YARN-8476
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler, capacityscheduler
>            Reporter: YunFan Zhou
>            Assignee: YunFan Zhou
>            Priority: Minor
>
> Hi, [~leftnoteasy]
> We recently merge https://issues.apache.org/jira/browse/YARN-5139 into our 
> version and found some bug.
>  Below is the more serious bugs I've encountered:
>  
> {code:java}
>   LeafQueue queue = ((LeafQueue) reservedApplication.getQueue());
>   assignment = queue.assignContainers(getClusterResource(), candidates,
>       // TODO, now we only consider limits for parent for non-labeled
>       // resources, should consider labeled resources as well.
>       new ResourceLimits(labelManager
>           .getResourceByLabel(RMNodeLabelsManager.NO_LABEL,
>               getClusterResource())),
>       SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY);
>   if (assignment.isFulfilledReservation()) {
>     if (withNodeHeartbeat) {
>       // Only update SchedulerHealth in sync scheduling, existing
>       // Data structure of SchedulerHealth need to be updated for
>       // Async mode
>       updateSchedulerHealth(lastNodeUpdateTime, node.getNodeID(),
>           assignment);
>     }
>     schedulerHealth.updateSchedulerFulfilledReservationCounts(1);
>     ActivitiesLogger.QUEUE.recordQueueActivity(activitiesManager, node,
>         queue.getParent().getQueueName(), queue.getQueueName(),
>         ActivityState.ACCEPTED, ActivityDiagnosticConstant.EMPTY);
>     ActivitiesLogger.NODE.finishAllocatedNodeAllocation(activitiesManager,
>         node, reservedContainer.getContainerId(),
>         AllocationState.ALLOCATED_FROM_RESERVED);
>   } else{
>     ActivitiesLogger.QUEUE.recordQueueActivity(activitiesManager, node,
>         queue.getParent().getQueueName(), queue.getQueueName(),
>         ActivityState.ACCEPTED, ActivityDiagnosticConstant.EMPTY);
>     ActivitiesLogger.NODE.finishAllocatedNodeAllocation(activitiesManager,
>         node, reservedContainer.getContainerId(), AllocationState.SKIPPED);
>   }
>   assignment.setSchedulingMode(
>       SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY);
>   submitResourceCommitRequest(getClusterResource(), assignment);
> }
> {code}
>  
> Before we submit assignment to *resourceCommitterService* service, we must 
> check the assignment is  greater than the *Resources. none().*
> Because the assignment can be *CSAssignment(Resources.createResource(0, 0), 
> NodeType.NODE_LOCAL)* after call *getRootQueue().assignContainers* method, 
> which is a meaningless value. 
>  
> But we are still going to submit it to *resourceCommitterService* service, 
> and lead to a bunch of meaningless assignments blocks other meaningful event 
> processing.
>  
> I think this is a very serious bug!  Any Suggestions?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to