[ 
https://issues.apache.org/jira/browse/YARN-10517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17305469#comment-17305469
 ] 

Qi Zhu commented on YARN-10517:
-------------------------------

I meet the problem too.

I fixed it and add corresponding test in  YARN-10517.001.patch.

 [~epayne] [~pbacsko] [~gandras]  [~ebadger]  Could you help review this?

Thanks.:D

> QueueMetrics has incorrect Allocated Resource when labelled partitions updated
> ------------------------------------------------------------------------------
>
>                 Key: YARN-10517
>                 URL: https://issues.apache.org/jira/browse/YARN-10517
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.8.0, 3.3.0
>            Reporter: sibyl.lv
>            Assignee: Qi Zhu
>            Priority: Major
>         Attachments: YARN-10517-branch-3.2.001.patch, YARN-10517.001.patch, 
> wrong metrics.png
>
>
> After https://issues.apache.org/jira/browse/YARN-9596, QueueMetrics still has 
> incorrect allocated jmx, such as  {color:#660e7a}allocatedMB, 
> {color}{color:#660e7a}allocatedVCores and 
> {color}{color:#660e7a}allocatedContainers, {color}when the node partition is 
> updated from "DEFAULT" to other label and there are  running applications.
> Steps to reproduce
> ==============
>  # Configure capacity-scheduler.xml with label configuration
>  # Submit one application to default partition and run
>  # Add label "tpcds" to cluster and replace label on node1 and node2 to be 
> "tpcds" when the above application is running
>  # Note down "VCores Used" at Web UI
>  # When the application is finished, the metrics get wrong (screenshots 
> attached).
> ==============
>  
> FiCaSchedulerApp doesn't update queue metrics when CapacityScheduler handles 
> this event {color:#660e7a}NODE_LABELS_UPDATE.{color}
> So we should release container resource from old partition and add used 
> resource to new partition, just as updating queueUsage.
> {code:java}
> // code placeholder
> public void nodePartitionUpdated(RMContainer rmContainer, String oldPartition,
>     String newPartition) {
>   Resource containerResource = rmContainer.getAllocatedResource();
>   this.attemptResourceUsage.decUsed(oldPartition, containerResource);
>   this.attemptResourceUsage.incUsed(newPartition, containerResource);
>   getCSLeafQueue().decUsedResource(oldPartition, containerResource, this);
>   getCSLeafQueue().incUsedResource(newPartition, containerResource, this);
>   // Update new partition name if container is AM and also update AM resource
>   if (rmContainer.isAMContainer()) {
>     setAppAMNodePartitionName(newPartition);
>     this.attemptResourceUsage.decAMUsed(oldPartition, containerResource);
>     this.attemptResourceUsage.incAMUsed(newPartition, containerResource);
>     getCSLeafQueue().decAMUsedResource(oldPartition, containerResource, this);
>     getCSLeafQueue().incAMUsedResource(newPartition, containerResource, this);
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to