[
https://issues.apache.org/jira/browse/YARN-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184400#comment-15184400
]
Naganarasimha G R commented on YARN-4712:
-----------------------------------------
Thanks [~sjlee0] for the clarification with example and agree that there is no
point in summing up {{cpuUsageTotalCoresPercentage}} but there are issues with
other approach( aggregating *cpuUsagePercentPerCore*) which you have also
mentioned
# total number of cores for the cluster would be required to actually arrive at
a percentage and also it would involve complexities when nodes go down
intermittently before the aggregation and effective usage should not be more
than the actual cluster cores
# total number of cores should not be just the machines cores but actually
which is made available to YARN (*nodeCpuPercentageForYARN*)
# Still it might not be better comparison as the type of the core might also
be different in a heterogenous cluster. Usually we try to have some
multiplication factor so that vcores overall match. so may be we need to
consider this factor too ?
But anyway this would require more discussion on what would be the right one to
choose so i will raise a new jira as this bug we have currently will block some
one testing the ATSv2
> CPU Usage Metric is not captured properly in YARN-2928
> ------------------------------------------------------
>
> Key: YARN-4712
> URL: https://issues.apache.org/jira/browse/YARN-4712
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: timelineserver
> Reporter: Naganarasimha G R
> Assignee: Naganarasimha G R
> Labels: yarn-2928-1st-milestone
> Attachments: YARN-4712-YARN-2928.v1.001.patch,
> YARN-4712-YARN-2928.v1.002.patch, YARN-4712-YARN-2928.v1.003.patch,
> YARN-4712-YARN-2928.v1.004.patch
>
>
> There are 2 issues with CPU usage collection
> * I was able to observe that that many times CPU usage got from
> {{pTree.getCpuUsagePercent()}} is
> ResourceCalculatorProcessTree.UNAVAILABLE(i.e. -1) but ContainersMonitor do
> the calculation i.e. {{cpuUsageTotalCoresPercentage = cpuUsagePercentPerCore
> /resourceCalculatorPlugin.getNumProcessors()}} because of which UNAVAILABLE
> check in {{NMTimelinePublisher.reportContainerResourceUsage}} is not
> encountered. so proper checks needs to be handled
> * {{EntityColumnPrefix.METRIC}} uses always LongConverter but
> ContainerMonitor is publishing decimal values for the CPU usage.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)