[ 
https://issues.apache.org/jira/browse/YARN-8476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

YunFan Zhou updated YARN-8476:
------------------------------
    Description: 
Hi, [~leftnoteasy]

We recently merge https://issues.apache.org/jira/browse/YARN-5139 into our 
version and found some bug.

 

Below is the more serious bugs I've encountered:

 
{code:java}
  LeafQueue queue = ((LeafQueue) reservedApplication.getQueue());
  assignment = queue.assignContainers(getClusterResource(), candidates,
      // TODO, now we only consider limits for parent for non-labeled
      // resources, should consider labeled resources as well.
      new ResourceLimits(labelManager
          .getResourceByLabel(RMNodeLabelsManager.NO_LABEL,
              getClusterResource())),
      SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY);

  if (assignment.isFulfilledReservation()) {
    if (withNodeHeartbeat) {
      // Only update SchedulerHealth in sync scheduling, existing
      // Data structure of SchedulerHealth need to be updated for
      // Async mode
      updateSchedulerHealth(lastNodeUpdateTime, node.getNodeID(),
          assignment);
    }

    schedulerHealth.updateSchedulerFulfilledReservationCounts(1);

    ActivitiesLogger.QUEUE.recordQueueActivity(activitiesManager, node,
        queue.getParent().getQueueName(), queue.getQueueName(),
        ActivityState.ACCEPTED, ActivityDiagnosticConstant.EMPTY);
    ActivitiesLogger.NODE.finishAllocatedNodeAllocation(activitiesManager,
        node, reservedContainer.getContainerId(),
        AllocationState.ALLOCATED_FROM_RESERVED);
  } else{
    ActivitiesLogger.QUEUE.recordQueueActivity(activitiesManager, node,
        queue.getParent().getQueueName(), queue.getQueueName(),
        ActivityState.ACCEPTED, ActivityDiagnosticConstant.EMPTY);
    ActivitiesLogger.NODE.finishAllocatedNodeAllocation(activitiesManager,
        node, reservedContainer.getContainerId(), AllocationState.SKIPPED);
  }

  assignment.setSchedulingMode(
      SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY);
  submitResourceCommitRequest(getClusterResource(), assignment);
}
{code}
 

Before we submit assignment to *resourceCommitterService* service, we must 
check the assignment is  greater than the *Resources. none().*

Because the assignment can be *CSAssignment(Resources.createResource(0, 0), 
NodeType.NODE_LOCAL)* after call *getRootQueue().assignContainers* method, 
which is a meaningless value. 

 

But we are still going to submit it to *resourceCommitterService* service, and 
lead to a bunch of meaningless assignments blocks other meaningful event 
processing.

 

I think this is a very serious bug!  Any Suggestions?

> Should check the resource of assignment is greater than Resources.none() 
> before submitResourceCommitRequest
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-8476
>                 URL: https://issues.apache.org/jira/browse/YARN-8476
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler, capacityscheduler
>            Reporter: YunFan Zhou
>            Assignee: YunFan Zhou
>            Priority: Blocker
>
> Hi, [~leftnoteasy]
> We recently merge https://issues.apache.org/jira/browse/YARN-5139 into our 
> version and found some bug.
>  
> Below is the more serious bugs I've encountered:
>  
> {code:java}
>   LeafQueue queue = ((LeafQueue) reservedApplication.getQueue());
>   assignment = queue.assignContainers(getClusterResource(), candidates,
>       // TODO, now we only consider limits for parent for non-labeled
>       // resources, should consider labeled resources as well.
>       new ResourceLimits(labelManager
>           .getResourceByLabel(RMNodeLabelsManager.NO_LABEL,
>               getClusterResource())),
>       SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY);
>   if (assignment.isFulfilledReservation()) {
>     if (withNodeHeartbeat) {
>       // Only update SchedulerHealth in sync scheduling, existing
>       // Data structure of SchedulerHealth need to be updated for
>       // Async mode
>       updateSchedulerHealth(lastNodeUpdateTime, node.getNodeID(),
>           assignment);
>     }
>     schedulerHealth.updateSchedulerFulfilledReservationCounts(1);
>     ActivitiesLogger.QUEUE.recordQueueActivity(activitiesManager, node,
>         queue.getParent().getQueueName(), queue.getQueueName(),
>         ActivityState.ACCEPTED, ActivityDiagnosticConstant.EMPTY);
>     ActivitiesLogger.NODE.finishAllocatedNodeAllocation(activitiesManager,
>         node, reservedContainer.getContainerId(),
>         AllocationState.ALLOCATED_FROM_RESERVED);
>   } else{
>     ActivitiesLogger.QUEUE.recordQueueActivity(activitiesManager, node,
>         queue.getParent().getQueueName(), queue.getQueueName(),
>         ActivityState.ACCEPTED, ActivityDiagnosticConstant.EMPTY);
>     ActivitiesLogger.NODE.finishAllocatedNodeAllocation(activitiesManager,
>         node, reservedContainer.getContainerId(), AllocationState.SKIPPED);
>   }
>   assignment.setSchedulingMode(
>       SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY);
>   submitResourceCommitRequest(getClusterResource(), assignment);
> }
> {code}
>  
> Before we submit assignment to *resourceCommitterService* service, we must 
> check the assignment is  greater than the *Resources. none().*
> Because the assignment can be *CSAssignment(Resources.createResource(0, 0), 
> NodeType.NODE_LOCAL)* after call *getRootQueue().assignContainers* method, 
> which is a meaningless value. 
>  
> But we are still going to submit it to *resourceCommitterService* service, 
> and lead to a bunch of meaningless assignments blocks other meaningful event 
> processing.
>  
> I think this is a very serious bug!  Any Suggestions?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to