Sangjin Lee commented on YARN-3815:

Here is my take on what's consensus, what's not, and what's currently out of 
scope. I may have misread the discussion and your impression/understanding may 
be different, so please feel free to chime in and comment on this!

(consensus or not controversial)
- applications table will be split from the main entities table
- app-level aggregation for framework-specific metrics will be done by the AM
- app-level aggregation for YARN-system container metrics will be done by the 
per-app timeline collector
- real-time aggregation does simple sum for all types of metrics
- metrics API will be updated to differentiate gauges and counters (the type 
information will need to be persisted in the storage)
- for gauges, in addition to the simple sum-based aggregation, support average 
and max
- the flow-run table will be created to handle app-to-flow-run ("real-time") 
aggregation as proposed in the native HBase schema design
- auxiliary tables will be implemented as proposed in the native HBase schema 
- time-based aggregation (daily, weekly, monthly, etc.) will be done via 
phoenix tables to enable ad-hoc queries

(questions remaining or undecided)
- for the average/max support for gauges (see above), confirm that's exactly 
what we want to support
- how to implement app-to-flow-run aggregation for gauges
- how to perform the time-based aggregation (mapreduce, using co-processor 
endpoints, etc.)
- how to handle long-running apps for time-based aggregation
- considering adopting "null delimiters" (or other phoenix-friendly tools) to 
support phoenix reading data from the native HBase tables
- using flow collectors, user collectors, and queue collectors as means of 
performing (higher-level) aggregation

(out of scope)
- support per-container averages for gauges
- any aggregation other than time-based aggregation for flows, users, and queues
- creating a dependency on the explicit YARN flow API

> [Aggregation] Application/Flow/User/Queue Level Aggregations
> ------------------------------------------------------------
>                 Key: YARN-3815
>                 URL: https://issues.apache.org/jira/browse/YARN-3815
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: Timeline Service Nextgen Flow, User, Queue Level 
> Aggregations (v1).pdf, aggregation-design-discussion.pdf, 
> hbase-schema-proposal-for-aggregation.pdf
> Per previous discussions in some design documents for YARN-2928, the basic 
> scenario is the query for stats can happen on:
> - Application level, expect return: an application with aggregated stats
> - Flow level, expect return: aggregated stats for a flow_run, flow_version 
> and flow 
> - User level, expect return: aggregated stats for applications submitted by 
> user
> - Queue level, expect return: aggregated stats for applications within the 
> Queue
> Application states is the basic building block for all other level 
> aggregations. We can provide Flow/User/Queue level aggregated statistics info 
> based on application states (a dedicated table for application states is 
> needed which is missing from previous design documents like HBase/Phoenix 
> schema design). 

This message was sent by Atlassian JIRA

Reply via email to