[ 
https://issues.apache.org/jira/browse/YARN-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15267794#comment-15267794
 ] 

Sangjin Lee commented on YARN-4821:
-----------------------------------

Thanks [~Naganarasimha] for the patch! I took a look at the patch, and I have 
some feedback and questions. I suspect there is more subtlety than it appears 
initially.

(1) config
Currently it is defined as "number of monitors to skip before publishing to 
timeline service." However, this is not too user-friendly. For example, if this 
value is 3, it really means we would publish every *4-th* time if I read this 
correctly. This may not be too apparent to users. My preference would be to use 
a true multiple. If we're going to emit every {{n}}-th time, we should let 
users define {{n}} as the config. Can we redefine this config?

Also, I would suggest the default value of 5 as discussed previously. 5 gives 
us a nice round number of 15 seconds (4 times a minute).

Also, can you please add it to {{yarn-default.xml}} too?

(2) keeping track of intermediate values
I am somewhat concerned about the amount of objects that get instantiated (and 
discarded) during this process. For every live container and every monitoring 
interval, it would create a new {{ResourceUtilization}} object, and also 
another {{ResourceUtilization}} object when you average. It would add to the 
garbage collection pressure.

Can we come up with a different way of keeping track of intermediate values? 
How about creating a data class that lets you accumulate values (as opposed to 
having a {{List}})? That way, a single live container needs only one data class 
object. Also, we won't have to store things like vmem which we're not tracking 
anyway.

(3) edge cases
What if we are still accumulating (without publishing) when the container is 
finished? I think we simply don't publish? I suspect that might be OK. We might 
want to make it explicit with a little comments. Any other edge case we need to 
consider?

Onto more line-level comments:
(ContainersMonitorImpl.java)
- l. 85: It should be private. Also, let's instantiate this map in 
{{serviceInit()}} only if NM timeline publisher is enabled (i.e. timeline 
service v.2 is enabled).
- l.579: I am puzzled by this. Why are we re-defining this memory metric to be 
in MB? Is this necessary as part of this patch? If there is no reason to 
redefine the unit, we should go back to bytes...

> Have a separate NM timeline publishing-interval
> -----------------------------------------------
>
>                 Key: YARN-4821
>                 URL: https://issues.apache.org/jira/browse/YARN-4821
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: YARN-2928
>            Reporter: Sangjin Lee
>            Assignee: Naganarasimha G R
>              Labels: yarn-2928-1st-milestone
>         Attachments: YARN-4821-YARN-2928.v1.001.patch
>
>
> Currently the interval with which NM publishes container CPU and memory 
> metrics is tied to {{yarn.nodemanager.resource-monitor.interval-ms}} whose 
> default is 3 seconds. This is too aggressive.
> There should be a separate configuration that controls how often 
> {{NMTimelinePublisher}} publishes container metrics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to