[
https://issues.apache.org/jira/browse/YARN-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596129#comment-14596129
]
Junping Du commented on YARN-3815:
----------------------------------
Thanks [~sjlee0] and [~jrottinghuis] for review and good comments in detail.
[~jrottinghuis]'s comments are pretty long and I could only reply part of it
and will finish the left parts tomorrow. :)
bq. For framework-specific metrics, I would say this falls on the individual
frameworks. The framework AM usually already aggregates them in memory
(consider MR job counters for example). So for them it is straightforward to
write them out directly onto the YARN app entities. Furthermore, it is
problematic to add them to the sub-app YARN entities and ask YARN to aggregate
them to the application. Framework’s sub-app entities may not even align with
YARN’s sub-app entities. For example, in case of MR, there is a reasonable
one-to-one mapping between a mapper/reducer task attempt and a container, but
for other applications that may not be true. Forcing all frameworks to hang
values at containers may not be practical. I think it’s far easier for
frameworks to write aggregated values to the YARN app entities.
AM currently leverage YARN's AppTimelineCollector to forward entities to
backend storage, so making AM talk directly to backend storage is not
considered to be safe. It is also not necessary too because the real difficulty
here is to aggregate framework specific metrics in other levels (flow, user and
queue), because that beyond the life cycle of framework so YARN have to take
care of it. Instead of asking frameworks to handle specific metrics themselves,
I would like to propose to treat these metrics as "anonymous", it would pass
both metrics name and value to YARN's collector and YARN's collector could
aggregate it and store as dynamic column (under framework_specific_metrics
column family) into app states table. So other (flow, user, etc.) level
aggregation on freamework metrics could happen based on this.
bq. app-to-flow online aggregation. This is more or less live aggregated
metrics at the flow level. This will still be based on the native HBase schema.
About flow online aggregation, I am not quite sure on requirement yet. Do we
really want real time for flow aggregated data or some fine-grained time
interval (like 15 secs) should be good enough - if we want to show some nice
metrics chart for flow, this should be fine. Even for real time, we don't have
to aggregate everything from raw entity table, we don't have to duplicated
count metrics again for finished apps. Isn't it?
bq. (3) time-based flow aggregation: This is different than the online
aggregation in the sense that it is aggregated along the time boundary (e.g.
“daily”, “weekly”, etc.). This can be based on the Phoenix schema. This can be
populated in an offline fashion (e.g. running a mapreduce job).
Any special reason not to handle it in the same way above - as HBase
coprocessor? It just sound like gross-grained time interval. Isn't it?
bq. This is another “offline” aggregation type. Also, I believe we’re talking
about only time-based aggregation. In other words, we would aggregate values
for users only with a well-defined time window. There won’t be a “real-time”
aggregation of values, similar to the flow aggregation.
I would also call for a fine-grained time interval (closed to real-time)
because the aggregated resource metrics on user could be used in billing hadoop
usage in a shared environment (no matter private or public cloud), so user need
to know more details on resource consumption especially in some random peak
time.
bq. Very much agree with separation into 2 categories "online" versus
"periodic". I think this will be natural split between the native HBase tables
for the former and the Phoenix approach for the latter to each emphasize their
relative strengths.
I would question the necessary for "online" again if this mean "real time"
instead of fine-grained time interval. Actually, as a building block, every
container metrics (cpu, memory, etc.) are generated in a time interval instead
of real time. As a result, we never know the exactly snapshot of whole system
in a precisely time but only can try to getting closer.
> [Aggregation] Application/Flow/User/Queue Level Aggregations
> ------------------------------------------------------------
>
> Key: YARN-3815
> URL: https://issues.apache.org/jira/browse/YARN-3815
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: timelineserver
> Reporter: Junping Du
> Assignee: Junping Du
> Priority: Critical
> Attachments: Timeline Service Nextgen Flow, User, Queue Level
> Aggregations (v1).pdf
>
>
> Per previous discussions in some design documents for YARN-2928, the basic
> scenario is the query for stats can happen on:
> - Application level, expect return: an application with aggregated stats
> - Flow level, expect return: aggregated stats for a flow_run, flow_version
> and flow
> - User level, expect return: aggregated stats for applications submitted by
> user
> - Queue level, expect return: aggregated stats for applications within the
> Queue
> Application states is the basic building block for all other level
> aggregations. We can provide Flow/User/Queue level aggregated statistics info
> based on application states (a dedicated table for application states is
> needed which is missing from previous design documents like HBase/Phoenix
> schema design).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)