[
https://issues.apache.org/jira/browse/YARN-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14593867#comment-14593867
]
Joep Rottinghuis commented on YARN-3815:
----------------------------------------
Very much agree with separation into 2 categories "online" versus "periodic". I
think this will be natural split between the native HBase tables for the former
and the Phoenix approach for the latter to each emphasize their relative
strengths.
A few thoughts around time-based aggregations:
- If the aggregation time is smaller than the runtime of apps/flows we need to
consider what that means for an aggregate. As an extreme example consider
hourly aggregates for applications that take hours to complete. What do we
actually count in that one hour? Do we only attribute to that hour the specific
total metric that came in at that time, or do we try to apportion part of the
increment to what happened only in that one hour? Ditto goes for daily
aggregates when we have long running jobs. In hRaven we simply don't deal with
this at all by making the simplifying assumption that all metrics and usage all
happen in the instant that the job is completed. With ATSv2 being (near)
real-time that will simply not work, so we need to consider what that means.
Are we requiring apps to write at least once within each aggregation period?
- If we store aggregates in columns (hourly columns, daily columns) we need to
limit the growth of # columns by making the next level aggregate part of the
rowkey. This would limit 24 hourly columns to a single day row. Similarly we'd
have 7 dailies in a week, or perhaps just up to 31 dailies in a month. All of
these considerations come from a strong need to be able to limit the range over
which we scan in order to get a reasonable performance in the face of lots of
data.
{quote}
Flow level:
○ expect return: aggregated stats for a flow_run, flow_version and flow
{quote}
I think "flow" level aggregations should really only mean flow-run level
aggregation in the sense of the separation that [~sjlee0] mentioned above for
HBase native online aggregations. I'm not sure that flow_version rollups even
make sense. Flow_version are important to be able to pass in as a filter: give
me stats for this flow only matching this version. This is useful for cases
such as reducer estimation where a job can make effective use only of previous
run data if the version of the flow hasn't changed. The fact that there were
three version of a Hive query is good to now. Knowing when each version first
appeared is good to know. Knowing the total cost for version 2 is probably less
useful.
Flow level aggregates are useful only with a particular timerange in mind. What
was the cost for the DailyActiveUsers job (no matter the version) for the last
week? How many bytes did the SearchJob read from HDFS in the last month?
Thoughts around queue level aggregation (in addition to Sangjin's comments that
these should be time-based):
Queue level aggregates have additional complexities. First queues can come and
go very quickly and apps can be moved from queue to queue. For the purpose of
normal shorter lived applications it might be tempting to use the final queue
that a job ran in (this is the assumption we make in hRaven). With long running
apps this assumption breaks down.
Now if an app runs for an hour and accumulates some value X for a metric Y it
will be recorded as such in the original queue agg. Now the application gets
moved and the new value of metric Y is now Z. Are we going to aggregate Z-X in
the new queue, or simply all of Z? The sums of all metrics Z in the queues will
not be the same as the sums of all apps or flows.
In addition, queues can grow and shrink on the fly. Are we going to record
that? In the very least we need to prefix the cluster in the rowkey so that we
can differentiate different queues from different clusters.
And then there are hierarchical queues. Are we thinking of rolling stats to
each level, or just in the individual leaf queue? Will we structure the rowkeys
that we can do prefix scans for queues called /cluster/parent/childa
/cluster/parent/childb ?
> [Aggregation] Application/Flow/User/Queue Level Aggregations
> ------------------------------------------------------------
>
> Key: YARN-3815
> URL: https://issues.apache.org/jira/browse/YARN-3815
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: timelineserver
> Reporter: Junping Du
> Assignee: Junping Du
> Priority: Critical
> Attachments: Timeline Service Nextgen Flow, User, Queue Level
> Aggregations (v1).pdf
>
>
> Per previous discussions in some design documents for YARN-2928, the basic
> scenario is the query for stats can happen on:
> - Application level, expect return: an application with aggregated stats
> - Flow level, expect return: aggregated stats for a flow_run, flow_version
> and flow
> - User level, expect return: aggregated stats for applications submitted by
> user
> - Queue level, expect return: aggregated stats for applications within the
> Queue
> Application states is the basic building block for all other level
> aggregations. We can provide Flow/User/Queue level aggregated statistics info
> based on application states (a dedicated table for application states is
> needed which is missing from previous design documents like HBase/Phoenix
> schema design).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)