[jira] [Commented] (YARN-3816) [Aggregation] App-level aggregation and accumulation for YARN system metrics

Sangjin Lee (JIRA) Thu, 07 Apr 2016 21:51:55 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15231626#comment-15231626
 ]


Sangjin Lee commented on YARN-3816:
-----------------------------------

Onto code-level comments...

First, there seem to be checkstyle violations and javadoc errors. Could you 
please fix them?

(RealTimeAggregationOperation.java)
- As mentioned in the above comment, this really appears to be about 
"accumulation". We should rename things here to "accumulation".
- l.36: We don’t need to update {{state}} for MAX? Could you explain how 
{{state}} is supposed to be used?
- I don’t think I understand {{SUM.exec()}}. Maybe some comment in the code (or 
a JIRA comment) could be helpful.
- l.116: There is no need for a separate interface ({{Operation}}). The 
{{exec()}} method can simply belong in {{RealTimeAggregationOperation}} itself.

(TimelineMetric.java)
- l.105: This is an unrelated issue with this patch, but I’m not sure what’s 
going on with the else clause in l.104-106 in the {{setValues()}} method. Could 
you look at it and fix it if it is not right?
- l.183: we should use {{StringBuilder}} (unsynchronized) over {{StringBuffer}} 
(synchronized)
- l.191: I would say use “get” instead of “retrieve” for these method names...
- l.192: nit: since this is an enum, {{==}} is sufficient (no need for 
{{equals()}}); the same for l.206 and 220
- l.196: It should be {{firstKey()}} because it’s reverse sorted, right? We’re 
looking for the latest timestamp.
- l.205: the name “key” is bit obscure. What we mean is the timestamp for the 
value. Should we rename this to {{getSingleDataTimestamp()}}?

(TimelineMetricCalculator.java)
- l.38: typo: “Number be compared” -> “Number to be compared”. The same with 
l.71
- l.41: nit: need a space before the opening brace
- l.76: same as above
- l.68: We stated that we will support only longs as the metric value type for 
now (and maybe double later). In any case, I think it’s safe to say we need not 
support ints. Should we simplify this by casting ints to longs if we see them?
- l.109: do we need to check for both being null?
- l.145: I think we should check to ensure time > 0. Also, it might be easier 
if we specify time as {{long}} instead of {{Long}}.
- l.151: wouldn’t it be easier if we called {{sum()}} to handle the summation 
part instead of implementing the summing logic here again?
- l.194: nit: space before the brace

(TimelineCollector.java)
- l.59-69: nit: let’s group all statics at the beginning and place instance 
members after them
- the executor should be shut down properly in {{serviceStop()}}, or it will 
leave those threads hanging around
- l.129: nit: we don’t need to specify {{TimelineCollector}} in calling the 
static methods (in several places here)
- l.218: nit: let’s surround it with {{LOG.isDebugEnabled()}}
- l.237-241: This is bit of an anti-pattern for using a {{ConcurrentHashMap}}. 
The issue is if multiple threads find that {{aggrRow}} is null and try to put 
their copies to the {{aggregateTable}} map, there is a race. As a result, you 
may start operating on an instance that will not be stored in the map 
eventually. You should use the {{putIfAbsent()}} method to make sure multiple 
threads always agree on the stored instance after the operation.
- l.247: nit: let’s use ==
- l.258: nit: let’s use ==

(TimelineReaderWebServices.java)
- Are the imports needed? There are no other code changes in this file?

> [Aggregation] App-level aggregation and accumulation for YARN system metrics
> ----------------------------------------------------------------------------
>
>                 Key: YARN-3816
>                 URL: https://issues.apache.org/jira/browse/YARN-3816
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Junping Du
>            Assignee: Li Lu
>              Labels: yarn-2928-1st-milestone
>         Attachments: Application Level Aggregation of Timeline Data.pdf, 
> YARN-3816-YARN-2928-v1.patch, YARN-3816-YARN-2928-v2.1.patch, 
> YARN-3816-YARN-2928-v2.2.patch, YARN-3816-YARN-2928-v2.3.patch, 
> YARN-3816-YARN-2928-v2.patch, YARN-3816-YARN-2928-v3.1.patch, 
> YARN-3816-YARN-2928-v3.patch, YARN-3816-YARN-2928-v4.patch, 
> YARN-3816-YARN-2928-v5.patch, YARN-3816-feature-YARN-2928.v4.1.patch, 
> YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch
>
>
> We need application level aggregation of Timeline data:
> - To present end user aggregated states for each application, include: 
> resource (CPU, Memory) consumption across all containers, number of 
> containers launched/completed/failed, etc. We need this for apps while they 
> are running as well as when they are done.
> - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be 
> aggregated to show details of states in framework level.
> - Other level (Flow/User/Queue) aggregation can be more efficient to be based 
> on Application-level aggregations rather than raw entity-level data as much 
> less raws need to scan (with filter out non-aggregated entities, like: 
> events, configurations, etc.).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3816) [Aggregation] App-level aggregation and accumulation for YARN system metrics

Reply via email to