subject:"\[jira\] \[Comment Edited\] \(YARN\-6375\) App level aggregation should not consider metric values reported in the previous aggregation cycle"

[jira] [Comment Edited] (YARN-6375) App level aggregation should not consider metric values reported in the previous aggregation cycle

2017-04-27 Thread Varun Saxena (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15986299#comment-15986299
 ] 

Varun Saxena edited comment on YARN-6375 at 4/27/17 9:55 AM:
-

Ping [~vrushalic].
Is the current approach fix fine with you?


was (Author: varun_saxena):
Ping [~vrushalic].
Is the current approach fine?

> App level aggregation should not consider metric values reported in the 
> previous aggregation cycle
> --
>
> Key: YARN-6375
> URL: https://issues.apache.org/jira/browse/YARN-6375
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-6375-YARN-5355.01.patch
>
>
> Currently app level aggregation is done every 15 seconds.
> And we consider last reported metric value for each entity belonging to an 
> app for aggregation.
> We however merely update the corresponding metric values for the entity on 
> put. We never remove the entries.
> But it is possible that multiple entities finish during lifetime of an 
> application. We however continue to consider them till the end.
> We should however not consider metric values of entities unless reported 
> within the 15 second period.
> Consider containers. For a long running app, several containers would start 
> and end at various times during the lifetime of an app.
> To consider metrics for all the containers throughout the lifetime of app, 
> hence wont be correct.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-6375) App level aggregation should not consider metric values reported in the previous aggregation cycle

2017-04-27 Thread Varun Saxena (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15986299#comment-15986299
 ] 

Varun Saxena edited comment on YARN-6375 at 4/27/17 9:55 AM:
-

Ping [~vrushalic].
Is the current fix fine with you?


was (Author: varun_saxena):
Ping [~vrushalic].
Is the current approach fix fine with you?

> App level aggregation should not consider metric values reported in the 
> previous aggregation cycle
> --
>
> Key: YARN-6375
> URL: https://issues.apache.org/jira/browse/YARN-6375
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-6375-YARN-5355.01.patch
>
>
> Currently app level aggregation is done every 15 seconds.
> And we consider last reported metric value for each entity belonging to an 
> app for aggregation.
> We however merely update the corresponding metric values for the entity on 
> put. We never remove the entries.
> But it is possible that multiple entities finish during lifetime of an 
> application. We however continue to consider them till the end.
> We should however not consider metric values of entities unless reported 
> within the 15 second period.
> Consider containers. For a long running app, several containers would start 
> and end at various times during the lifetime of an app.
> To consider metrics for all the containers throughout the lifetime of app, 
> hence wont be correct.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-6375) App level aggregation should not consider metric values reported in the previous aggregation cycle

2017-03-31 Thread Varun Saxena (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951228#comment-15951228
 ] 

Varun Saxena edited comment on YARN-6375 at 3/31/17 4:37 PM:
-

Thanks [~vrushalic] for the comments.
I was primarily going by the metrics we currently report. And what can we 
support at this point of time
The 2 examples you pointed out is basically the difference between how we 
should treat a gauge vs a counter.
As you said for CPU and Memory, it probably makes sense in the way I have done 
in the patch. This is what I had primarily in mind while raising JIRA.
But for counters the current mechanism might be better.

Counters such as HDFS bytes read can be aggregated at the application level by 
AM itself as they are currently done by Mapreduce AM. However, I do see merit 
though in aggregating them on our own on the collector side.

If we do say that we will sum up metrics across the lifetime of an app then we 
would need to handle that on collector restart as well. Do we store all the 
metrics in the state store which are read back upon restart? Or probably we 
would need some kind of edit log mechanism.
Probably we need some way of identifying the metric type (gauge vs counter). 
And do not clear counters but clear gauges upon aggregation. Maybe indicate it 
in TimelineMetric object and rely on client reporting it in a consistent 
fashion?
Or adopt the way we aggregate metrics for a flow run?

This probably needs to be discussed further.

bq. for CPU, perhaps we might want to know across this application what was the 
cpu used by all the containers in the lifetime of this app?
This would be better dealt with time weighted accumulation which is in future 
scope. Just summing up metrics of all entities running across different 
timelines may not be very useful for metrics such as CPU / Memory. For 
instance, 
* App1 has 3 containers running one after the other for 10 second each i.e. app 
runs for 30 seconds. And these containers while they were running, report CPU 
of 20% each time.
* App2 has 3 containers but they run parallely for 30 seconds. And similar to 
above these containers report CPU of 20% each time.
Now, aggregation as it is done at 15 second period would lead to a value of 
120% by the end in both the cases. But both the apps and their resource 
requirements are very distinct.


was (Author: varun_saxena):
Thanks [~vrushalic] for the comments.
I was primarily going by the metrics we currently report. And what can we 
support at this point of time
The 2 examples you pointed out is basically the difference between how we 
should treat a gauge vs a counter.
As you said for CPU and Memory, it probably makes sense in the way I have done 
in the patch. This is what I had primarily in mind while raising JIRA.
But for counters the current mechanism might be better.

Counters such as HDFS bytes read can be aggregated at the application level by 
AM itself as they are currently done by Mapreduce AM. However, I do see merit 
though in aggregating them on our own on the collector side.

If we do say that we will sum up metrics across the lifetime of an app then we 
would need to handle that on collector restart as well. Do we store all the 
metrics in the state store which are read back upon restart? Or probably we 
would need some kind of edit log mechanism.
Probably we need some way of identifying the metric type (gauge vs counter). 
And do not clear counters but clear gauges upon aggregation. Maybe indicate it 
in TimelineMetric object and rely on client reporting it in a consistent 
fashion?
Or adopt the way we aggregate metrics for a flow run?

This probably needs to be discussed further.

bq. for CPU, perhaps we might want to know across this application what was the 
cpu used by all the containers in the lifetime of this app?
This would be better dealt with time weighted accumulation which is in future 
scope. Just summing up metrics of all entities running across different 
timelines may not be very useful for metrics such as CPU / Memory. For 
instance, 
* App1 has 3 containers running one after the other for 10 second each i.e. app 
runs for 30 seconds. And these containers while they were running, report CPU 
of 20% each time.
* App2 has 3 containers but they run parallely for 30 seconds. And similar to 
above these containers report CPU of 20% each time.
Now, aggregation as it is done at 15 second period would lead to a value of 
120% in both the cases. But both the apps and their resource requirements are 
very distinct.

> App level aggregation should not consider metric values reported in the 
> previous aggregation cycle
> --
>
> Key: YARN-6375
> URL: https://issues.apache.org/jira/browse/YARN-6375
> Project: Hadoop YARN
>

[jira] [Comment Edited] (YARN-6375) App level aggregation should not consider metric values reported in the previous aggregation cycle

[jira] [Comment Edited] (YARN-6375) App level aggregation should not consider metric values reported in the previous aggregation cycle

[jira] [Comment Edited] (YARN-6375) App level aggregation should not consider metric values reported in the previous aggregation cycle

3 matches

Site Navigation

Mail list logo

Footer information