Sangjin Lee commented on YARN-3815:

AM currently leverage YARN's AppTimelineCollector to forward entities to 
backend storage, so making AM talk directly to backend storage is not 
considered to be safe.

Just to be clear, I'm *not* proposing AMs writing directly to the backend 
storage. AMs continue to write through the app-level timeline collector. My 
proposal is that the AMs are responsible for setting the aggregated 
framework-specific metric values on the *YARN application entities*.

Let's consider the example of MR. MR itself would have its own entities such as 
job, tasks, and task attempts. These are distinct entities from the YARN 
entities such as application, app attempts, and containers. We can either (1) 
have the MR AM set framework-specific metric values at the YARN container 
entities and have YARN aggregate them to applications, or (2) have the MR AM 
set the aggregated values on the applications for itself.

I feel the latter approach is conceptually cleaner. The framework is ultimately 
responsible for its metrics (YARN doesn't even know what metrics there are). We 
could decide that YARN would look at the framework-specific metrics at the app 
level and aggregate them from the app level onward to flows, user, and queue.

In addition, most frameworks already have an aggregated view of the metrics. It 
would be very straightforward to emit them at the app level.

In summary, option (1) asks the framework to write metrics on its own entities 
(job, tasks, task attempts) plus YARN container entities. Option (2) asks the 
framework to write metrics on its own entities (job, tasks, task attempts) plus 
YARN app entities. IMO, the latter is a more reliable approach. We can discuss 
this further...

> [Aggregation] Application/Flow/User/Queue Level Aggregations
> ------------------------------------------------------------
>                 Key: YARN-3815
>                 URL: https://issues.apache.org/jira/browse/YARN-3815
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: Timeline Service Nextgen Flow, User, Queue Level 
> Aggregations (v1).pdf
> Per previous discussions in some design documents for YARN-2928, the basic 
> scenario is the query for stats can happen on:
> - Application level, expect return: an application with aggregated stats
> - Flow level, expect return: aggregated stats for a flow_run, flow_version 
> and flow 
> - User level, expect return: aggregated stats for applications submitted by 
> user
> - Queue level, expect return: aggregated stats for applications within the 
> Queue
> Application states is the basic building block for all other level 
> aggregations. We can provide Flow/User/Queue level aggregated statistics info 
> based on application states (a dedicated table for application states is 
> needed which is missing from previous design documents like HBase/Phoenix 
> schema design). 

This message was sent by Atlassian JIRA

Reply via email to