[ 
https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15231648#comment-15231648
 ] 

Li Lu commented on YARN-3816:
-----------------------------

Thanks [~sjlee0]! Yes I did use the word "accumulation" and "aggregation" in an 
interchangeable fashion, and I can certainly correct this in the follow up 
patch. However, I think you may overlooked one key change in the latest (v5) 
patch (due to the word "accumulation"). In this patch, my main focus is to 
implement aggregation (aggregating container metrics to application level), 
even though the API for TimelineMetric is called "accumulate". Aggregating 
metrics from all containers to one application is performed in timeline 
collector, using the internal Map called aggregationGroups. In this map, we 
maintain the aggregation status for each "group" (right now I used entity_type 
since all CONTAINER type entities will be mapped together). Within one 
aggregation group, we maintain metric status for each entity_id (each container 
id). On aggregation, for each aggregation group (like CONTAINER entity type), 
for each existing metric (like HDFS_BYTES_WRITE), we iterate through all known 
entity ids (containers) and perform the aggregation operation defined in the 
metric's realtimeAggregationOp field. 

On contrary to your comment, accumulation is actually the part missing in this 
draft patch. When we update the state for one container on one metric, we 
simply replace the previous one (In AggregationStatus#update, 
{{aggrRow.getPerEntityMetrics().put(entityId, m);}}). We can add methods to 
perform time-based accumulation later (reusing the "accumulate" method's name). 

BTW, by default metrics' aggregation op field is set to NOP so that we're not 
keeping them in the aggregation status table. 

Given the tight timeframe, we can certainly sync up offline if needed. Thanks! 


> [Aggregation] App-level aggregation and accumulation for YARN system metrics
> ----------------------------------------------------------------------------
>
>                 Key: YARN-3816
>                 URL: https://issues.apache.org/jira/browse/YARN-3816
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Junping Du
>            Assignee: Li Lu
>              Labels: yarn-2928-1st-milestone
>         Attachments: Application Level Aggregation of Timeline Data.pdf, 
> YARN-3816-YARN-2928-v1.patch, YARN-3816-YARN-2928-v2.1.patch, 
> YARN-3816-YARN-2928-v2.2.patch, YARN-3816-YARN-2928-v2.3.patch, 
> YARN-3816-YARN-2928-v2.patch, YARN-3816-YARN-2928-v3.1.patch, 
> YARN-3816-YARN-2928-v3.patch, YARN-3816-YARN-2928-v4.patch, 
> YARN-3816-YARN-2928-v5.patch, YARN-3816-feature-YARN-2928.v4.1.patch, 
> YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch
>
>
> We need application level aggregation of Timeline data:
> - To present end user aggregated states for each application, include: 
> resource (CPU, Memory) consumption across all containers, number of 
> containers launched/completed/failed, etc. We need this for apps while they 
> are running as well as when they are done.
> - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be 
> aggregated to show details of states in framework level.
> - Other level (Flow/User/Queue) aggregation can be more efficient to be based 
> on Application-level aggregations rather than raw entity-level data as much 
> less raws need to scan (with filter out non-aggregated entities, like: 
> events, configurations, etc.).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to