[ 
https://issues.apache.org/jira/browse/YARN-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15183343#comment-15183343
 ] 

Varun Saxena commented on YARN-4712:
------------------------------------

Thanks [~Naganarasimha] for the patch.
bq. IMO cpuUsageTotalCoresPercentage is important to gauge how much of the 
cluster's CPU is getting utlized, if its cpuUsagePercentPerCore i beleive it 
doesnt give the cluster's CPU on aggregation from all containers. Infact we 
need to report both and also IMO cpuUsageTotalCoresPercentage is not calculated 
properly it should be
In ContainersMonitorImpl, we calculate CPU per container process.
There are 2 primary CPU values here. 
{{cpuUsagePercentPerCore}} is similar to what we see in top command output i.e. 
if we have a 4 core machine, and 3 of the cores are used by a specific process, 
we will see CPU% as 300%.
This value on Linux will be calculated by reading {{/proc/<pid>/stat}} from 
where we read the amount of time spent(in terms of CPU jiffies) in kernel and 
user space by the process. And get the effective CPU% based on the sample 
values read earlier and read now.
{{cpuUsageTotalCoresPercentage}} on the other hand is a normalized value based 
on number of processors configured for the node.
{{nodeCpuPercentageForYARN}} is a config which places an upper limit on CPU to 
be used by containers. IIUC this config has to be used with cgroups(although 
its not restricted in code).
This config is factored in while reporting CPU resource utilization in Node HB 
to RM.
In a heterogeneous cluster, all these 3 values maybe useful. Thoughts ?

> CPU Usage Metric is not captured properly in YARN-2928
> ------------------------------------------------------
>
>                 Key: YARN-4712
>                 URL: https://issues.apache.org/jira/browse/YARN-4712
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Naganarasimha G R
>            Assignee: Naganarasimha G R
>              Labels: yarn-2928-1st-milestone
>         Attachments: YARN-4712-YARN-2928.v1.001.patch, 
> YARN-4712-YARN-2928.v1.002.patch, YARN-4712-YARN-2928.v1.003.patch, 
> YARN-4712-YARN-2928.v1.004.patch
>
>
> There are 2 issues with CPU usage collection 
> * I was able to observe that that many times CPU usage got from 
> {{pTree.getCpuUsagePercent()}} is 
> ResourceCalculatorProcessTree.UNAVAILABLE(i.e. -1) but ContainersMonitor do 
> the calculation  i.e. {{cpuUsageTotalCoresPercentage = cpuUsagePercentPerCore 
> /resourceCalculatorPlugin.getNumProcessors()}} because of which UNAVAILABLE 
> check in {{NMTimelinePublisher.reportContainerResourceUsage}} is not 
> encountered. so proper checks needs to be handled
> * {{EntityColumnPrefix.METRIC}} uses always LongConverter but 
> ContainerMonitor is publishing decimal values for the CPU usage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to