[
https://issues.apache.org/jira/browse/YARN-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15171188#comment-15171188
]
Varun Saxena commented on YARN-4712:
------------------------------------
[~Naganarasimha], thanks for the patch.
Looks fine overall. Few minor comments/nits.
* The javac and 2 of the checkstyle issues seem fixable.
* Moreover, should we use Math#round instead of floor ? I would personally
prefer round.
* In the test, we can include an assertion with CPU value of less than 1 or a
value reported with decimals. This way we can verify behavior of conversion of
value after applying round or floor(whichever we use). Somebody should not be
able to change the behavior inadvertently then.
* Also, we are using a deterministic sleep in the test(looping 5 times over).
Is 750 ms enough for a slow machine ? Maybe use a while loop and wait till
condition meets and add an overall larger timeout to the test or add more
iterations, just to avoid test failing on a slow machine ?
* Nit : In assertEquals message being passed is empty (""), either we can have
some message or just call the assertEquals version which is without message.
* In the test, for the line
{{publisher.reportContainerResourceUsage(aContainer, PID, 1024L, -1F);}}, use
the constant {{ResourceCalculatorProcessTree.UNAVAILABLE}} instead of -1 to
avoid failure in case we change this constant's value.
* {{verifyPublishedResourceUsageMetrics(timelineClient, 1, 1024L, -999);}}.
Here -999 can be used to denote a special meaning that cpu usage is not
published. We do not explicitly check for it within
verifyPublishedResourceUsageMetrics method. I mean we are indicating that
numberOfResourceMetrics published is 1. But which one(MEMORY or CPU) is not
checked. Maybe this -999 or better still we can use
{{ResourceCalculatorProcessTree.UNAVAILABLE}} to verify if we have to check
CPU/MEMORY or not. And if we get unexpected metric, we can fail the test.
> CPU Usage Metric is not captured properly in YARN-2928
> ------------------------------------------------------
>
> Key: YARN-4712
> URL: https://issues.apache.org/jira/browse/YARN-4712
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: timelineserver
> Reporter: Naganarasimha G R
> Assignee: Naganarasimha G R
> Labels: yarn-2928-1st-milestone
> Attachments: YARN-4712-YARN-2928.v1.001.patch
>
>
> There are 2 issues with CPU usage collection
> * I was able to observe that that many times CPU usage got from
> {{pTree.getCpuUsagePercent()}} is
> ResourceCalculatorProcessTree.UNAVAILABLE(i.e. -1) but ContainersMonitor do
> the calculation i.e. {{cpuUsageTotalCoresPercentage = cpuUsagePercentPerCore
> /resourceCalculatorPlugin.getNumProcessors()}} because of which UNAVAILABLE
> check in {{NMTimelinePublisher.reportContainerResourceUsage}} is not
> encountered. so proper checks needs to be handled
> * {{EntityColumnPrefix.METRIC}} uses always LongConverter but
> ContainerMonitor is publishing decimal values for the CPU usage.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)