[
https://issues.apache.org/jira/browse/YARN-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15183343#comment-15183343
]
Varun Saxena commented on YARN-4712:
------------------------------------
Thanks [~Naganarasimha] for the patch.
bq. IMO cpuUsageTotalCoresPercentage is important to gauge how much of the
cluster's CPU is getting utlized, if its cpuUsagePercentPerCore i beleive it
doesnt give the cluster's CPU on aggregation from all containers. Infact we
need to report both and also IMO cpuUsageTotalCoresPercentage is not calculated
properly it should be
In ContainersMonitorImpl, we calculate CPU per container process.
There are 2 primary CPU values here.
{{cpuUsagePercentPerCore}} is similar to what we see in top command output i.e.
if we have a 4 core machine, and 3 of the cores are used by a specific process,
we will see CPU% as 300%.
This value on Linux will be calculated by reading {{/proc/<pid>/stat}} from
where we read the amount of time spent(in terms of CPU jiffies) in kernel and
user space by the process. And get the effective CPU% based on the sample
values read earlier and read now.
{{cpuUsageTotalCoresPercentage}} on the other hand is a normalized value based
on number of processors configured for the node.
{{nodeCpuPercentageForYARN}} is a config which places an upper limit on CPU to
be used by containers. IIUC this config has to be used with cgroups(although
its not restricted in code).
This config is factored in while reporting CPU resource utilization in Node HB
to RM.
In a heterogeneous cluster, all these 3 values maybe useful. Thoughts ?
> CPU Usage Metric is not captured properly in YARN-2928
> ------------------------------------------------------
>
> Key: YARN-4712
> URL: https://issues.apache.org/jira/browse/YARN-4712
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: timelineserver
> Reporter: Naganarasimha G R
> Assignee: Naganarasimha G R
> Labels: yarn-2928-1st-milestone
> Attachments: YARN-4712-YARN-2928.v1.001.patch,
> YARN-4712-YARN-2928.v1.002.patch, YARN-4712-YARN-2928.v1.003.patch,
> YARN-4712-YARN-2928.v1.004.patch
>
>
> There are 2 issues with CPU usage collection
> * I was able to observe that that many times CPU usage got from
> {{pTree.getCpuUsagePercent()}} is
> ResourceCalculatorProcessTree.UNAVAILABLE(i.e. -1) but ContainersMonitor do
> the calculation i.e. {{cpuUsageTotalCoresPercentage = cpuUsagePercentPerCore
> /resourceCalculatorPlugin.getNumProcessors()}} because of which UNAVAILABLE
> check in {{NMTimelinePublisher.reportContainerResourceUsage}} is not
> encountered. so proper checks needs to be handled
> * {{EntityColumnPrefix.METRIC}} uses always LongConverter but
> ContainerMonitor is publishing decimal values for the CPU usage.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)