Vrushali C commented on YARN-3817:

bq. This said, we may want to try to implement the offline aggregations as 
map-reduce jobs as our first attempt

+1 to this. 
We haven't yet worked out the aggregation at the timeseries level for a flow. 

I too did some estimates on the sizes and actually it will be much higher than 
what you have above since there is also time series data that is emitted by app 
master as well as individual containers. 

I will share an excel that I have created so that we can think about how we 
want to emit the timeseries metrics.

> [Aggregation] Flow and User level aggregation on Application States table
> -------------------------------------------------------------------------
>                 Key: YARN-3817
>                 URL: https://issues.apache.org/jira/browse/YARN-3817
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Junping Du
>            Assignee: Junping Du
>         Attachments: Detail Design for Flow and User Level Aggregation.pdf
> We need flow/user level aggregation to present flow/user related states to 
> end users.
> Flow level aggregation involve three levels aggregations:
> - The first level is Flow_run level which represents one execution of a flow 
> and shows exactly aggregated data for a run of flow.
> - The 2nd level is Flow_version level which represents summary info of a 
> version of flow.
> - The 3rd level is Flow level which represents summary info of a specific 
> flow.
> User level aggregation represents summary info of a specific user, it should 
> include summary info of accumulated and statistic means (by two levels: 
> application and flow), like: number of Flows, applications, resource 
> consumption, resource means per app or flow, etc. 

This message was sent by Atlassian JIRA

Reply via email to