Vrushali C commented on YARN-3816:

bq. If I understand the problem correctly, we have two dimensions in a 
flow/user level aggregation: one dimension for all entities belong to this 
flow/user, another dimension for time.

Ah not quite. Time dimension goes with flow/user/queue. For example, we will 
aggregate for user level stats over a time period like daily or weekly. 
Similarly for flows. Flows are aggregated over one day or one week in hRaven. 
Ditto for users and queues. So let's say, for simplicity, user1 ran a wordcount 
map reduce job three times on Monday and a sleep job two times on monday. Now 
daily aggregation table for user1 will have sum of each metric which is a 
counter on that day, that is

M1 for user1 on monday = M1 from wordcount.Run1 on monday + M1 from 
wordcount.Run2 on monday + M1 from wordcount.Run3 on monday  + M1 from 
sleep.run1 on monday + M1 from sleep.run2 on monday. 


Now, for flows on monday:

M1 for wordcount on monday = M1 from wordcount.run1 on monday + M1 from 
wordcount.run2 on monday + M1 from wordcount.Run3 on monday  
M1 for sleep on monday = M1 from sleep.run1 on monday + M1 from sleep.run2 on 

For timeseries, we need to decide what aggregation means. One option is that we 
could normalize the values to a minute level granularity. For example, add up 
values per min across each time. So anything that occurred within a minute will 
be assigned to the top of that minute: eg if something happening at 2 min 10 
seconds is considered to have occurred at 2 min.  That way we can sum up across 
flows/users/runs etc.

> [Aggregation] App-level Aggregation for YARN system metrics
> -----------------------------------------------------------
>                 Key: YARN-3816
>                 URL: https://issues.apache.org/jira/browse/YARN-3816
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Junping Du
>            Assignee: Junping Du
>         Attachments: Application Level Aggregation of Timeline Data.pdf, 
> YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch
> We need application level aggregation of Timeline data:
> - To present end user aggregated states for each application, include: 
> resource (CPU, Memory) consumption across all containers, number of 
> containers launched/completed/failed, etc. We need this for apps while they 
> are running as well as when they are done.
> - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be 
> aggregated to show details of states in framework level.
> - Other level (Flow/User/Queue) aggregation can be more efficient to be based 
> on Application-level aggregations rather than raw entity-level data as much 
> less raws need to scan (with filter out non-aggregated entities, like: 
> events, configurations, etc.).

This message was sent by Atlassian JIRA

Reply via email to