Joep Rottinghuis commented on YARN-3815:

Very much agree with separation into 2 categories "online" versus "periodic". I 
think this will be natural split between the native HBase tables for the former 
and the Phoenix approach for the latter to each emphasize their relative 

A few thoughts around time-based aggregations:
- If the aggregation time is smaller than the runtime of apps/flows we need to 
consider what that means for an aggregate. As an extreme example consider 
hourly aggregates for applications that take hours to complete. What do we 
actually count in that one hour? Do we only attribute to that hour the specific 
total metric that came in at that time, or do we try to apportion part of the 
increment to what happened only in that one hour? Ditto goes for daily 
aggregates when we have long running jobs. In hRaven we simply don't deal with 
this at all by making the simplifying assumption that all metrics and usage all 
happen in the instant that the job is completed. With ATSv2 being (near) 
real-time that will simply not work, so we need to consider what that means. 
Are we requiring apps to write at least once within each aggregation period?
- If we store aggregates in columns (hourly columns, daily columns) we need to 
limit the growth of # columns by making the next level aggregate part of the 
rowkey. This would limit 24 hourly columns to a single day row. Similarly we'd 
have 7 dailies in a week, or perhaps just up to 31 dailies in a month. All of 
these considerations come from a strong need to be able to limit the range over 
which we scan in order to get a reasonable performance in the face of lots of 

Flow level:
○ expect return: aggregated stats for a flow_run, flow_version and flow
I think "flow" level aggregations should really only mean flow-run level 
aggregation in the sense of the separation that [~sjlee0] mentioned above for 
HBase native online aggregations. I'm not sure that flow_version rollups even 
make sense. Flow_version are important to be able to pass in as a filter: give 
me stats for this flow only matching this version. This is useful for cases 
such as reducer estimation where a job can make effective use only of previous 
run data if the version of the flow hasn't changed. The fact that there were 
three version of a Hive query is good to now. Knowing when each version first 
appeared is good to know. Knowing the total cost for version 2 is probably less 
Flow level aggregates are useful only with a particular timerange in mind. What 
was the cost for the DailyActiveUsers job (no matter the version) for the last 
week? How many bytes did the SearchJob read from HDFS in the last month?

Thoughts around queue level aggregation (in addition to Sangjin's comments that 
these should be time-based):
Queue level aggregates have additional complexities. First queues can come and 
go very quickly and apps can be moved from queue to queue. For the purpose of 
normal shorter lived applications it might be tempting to use the final queue 
that a job ran in (this is the assumption we make in hRaven). With long running 
apps this assumption breaks down.
Now if an app runs for an hour and accumulates some value X for a metric Y it 
will be recorded as such in the original queue agg. Now the application gets 
moved and the new value of metric Y is now Z. Are we going to aggregate Z-X in 
the new queue, or simply all of Z? The sums of all metrics Z in the queues will 
not be the same as the sums of all apps or flows.

In addition, queues can grow and shrink on the fly. Are we going to record 
that? In the very least we need to prefix the cluster in the rowkey so that we 
can differentiate different queues from different clusters.

And then there are hierarchical queues. Are we thinking of rolling stats to 
each level, or just in the individual leaf queue? Will we structure the rowkeys 
that we can do prefix scans for queues called /cluster/parent/childa 
/cluster/parent/childb ?

> [Aggregation] Application/Flow/User/Queue Level Aggregations
> ------------------------------------------------------------
>                 Key: YARN-3815
>                 URL: https://issues.apache.org/jira/browse/YARN-3815
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: Timeline Service Nextgen Flow, User, Queue Level 
> Aggregations (v1).pdf
> Per previous discussions in some design documents for YARN-2928, the basic 
> scenario is the query for stats can happen on:
> - Application level, expect return: an application with aggregated stats
> - Flow level, expect return: aggregated stats for a flow_run, flow_version 
> and flow 
> - User level, expect return: aggregated stats for applications submitted by 
> user
> - Queue level, expect return: aggregated stats for applications within the 
> Queue
> Application states is the basic building block for all other level 
> aggregations. We can provide Flow/User/Queue level aggregated statistics info 
> based on application states (a dedicated table for application states is 
> needed which is missing from previous design documents like HBase/Phoenix 
> schema design). 

This message was sent by Atlassian JIRA

Reply via email to