[
https://issues.apache.org/jira/browse/YARN-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15187558#comment-15187558
]
Sunil G commented on YARN-4308:
-------------------------------
Hi [~djp] , [~vvasudev]
Recently we had few discussion in one of the ATS metrics jira YARN-4172
regarding -1 handling for CPU usage.
Agreeing that we need to send -1 when there are no reading available, I would
like to point 2 cases here;
1. When cpu sample is taken for {{first time}}, current code snippet in
{{CpuTimeTracker}} is sending -1. In such cases, its debatable that whether we
can send -1 for this case or not. May be we could start with 0 or even we can
wait for a cycle to report back.
2. If {{CpuTimeTracker#getCpuTrackerUsagePercent}} returns -1, we can send
the reading as it is back to caller. There is no need to operate on same.
{{ResourceCalculatorProcessTree.UNAVAILABLE}} can be returned as CPU usage.
If thoughts are same, I can update a new patch.
> ContainersAggregated CPU resource utilization reports negative usage in first
> few heartbeats
> --------------------------------------------------------------------------------------------
>
> Key: YARN-4308
> URL: https://issues.apache.org/jira/browse/YARN-4308
> Project: Hadoop YARN
> Issue Type: Bug
> Components: nodemanager
> Affects Versions: 2.7.1
> Reporter: Sunil G
> Assignee: Sunil G
> Attachments: 0001-YARN-4308.patch
>
>
> NodeManager reports ContainerAggregated CPU resource utilization as -ve value
> in first few heartbeats cycles. I added a new debug print and received below
> values from heartbeats.
> {noformat}
> INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
> ContainersResource Utilization : CpuTrackerUsagePercent : -1.0
> INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:ContainersResource
> Utilization : CpuTrackerUsagePercent : 198.94598
> {noformat}
> Its better we send 0 as CPU usage rather than sending a negative values in
> heartbeats eventhough its happening in only first few heartbeats.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)