[
https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15231648#comment-15231648
]
Li Lu commented on YARN-3816:
-----------------------------
Thanks [~sjlee0]! Yes I did use the word "accumulation" and "aggregation" in an
interchangeable fashion, and I can certainly correct this in the follow up
patch. However, I think you may overlooked one key change in the latest (v5)
patch (due to the word "accumulation"). In this patch, my main focus is to
implement aggregation (aggregating container metrics to application level),
even though the API for TimelineMetric is called "accumulate". Aggregating
metrics from all containers to one application is performed in timeline
collector, using the internal Map called aggregationGroups. In this map, we
maintain the aggregation status for each "group" (right now I used entity_type
since all CONTAINER type entities will be mapped together). Within one
aggregation group, we maintain metric status for each entity_id (each container
id). On aggregation, for each aggregation group (like CONTAINER entity type),
for each existing metric (like HDFS_BYTES_WRITE), we iterate through all known
entity ids (containers) and perform the aggregation operation defined in the
metric's realtimeAggregationOp field.
On contrary to your comment, accumulation is actually the part missing in this
draft patch. When we update the state for one container on one metric, we
simply replace the previous one (In AggregationStatus#update,
{{aggrRow.getPerEntityMetrics().put(entityId, m);}}). We can add methods to
perform time-based accumulation later (reusing the "accumulate" method's name).
BTW, by default metrics' aggregation op field is set to NOP so that we're not
keeping them in the aggregation status table.
Given the tight timeframe, we can certainly sync up offline if needed. Thanks!
> [Aggregation] App-level aggregation and accumulation for YARN system metrics
> ----------------------------------------------------------------------------
>
> Key: YARN-3816
> URL: https://issues.apache.org/jira/browse/YARN-3816
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: timelineserver
> Reporter: Junping Du
> Assignee: Li Lu
> Labels: yarn-2928-1st-milestone
> Attachments: Application Level Aggregation of Timeline Data.pdf,
> YARN-3816-YARN-2928-v1.patch, YARN-3816-YARN-2928-v2.1.patch,
> YARN-3816-YARN-2928-v2.2.patch, YARN-3816-YARN-2928-v2.3.patch,
> YARN-3816-YARN-2928-v2.patch, YARN-3816-YARN-2928-v3.1.patch,
> YARN-3816-YARN-2928-v3.patch, YARN-3816-YARN-2928-v4.patch,
> YARN-3816-YARN-2928-v5.patch, YARN-3816-feature-YARN-2928.v4.1.patch,
> YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch
>
>
> We need application level aggregation of Timeline data:
> - To present end user aggregated states for each application, include:
> resource (CPU, Memory) consumption across all containers, number of
> containers launched/completed/failed, etc. We need this for apps while they
> are running as well as when they are done.
> - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be
> aggregated to show details of states in framework level.
> - Other level (Flow/User/Queue) aggregation can be more efficient to be based
> on Application-level aggregations rather than raw entity-level data as much
> less raws need to scan (with filter out non-aggregated entities, like:
> events, configurations, etc.).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)