[ 
https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573830#comment-14573830
 ] 

Zhijie Shen commented on YARN-3051:
-----------------------------------

[~varun_saxena], thanks for working on the new patch. It seems to be a complete 
reader side protype, which is nice. I still need some time to take thorough 
look, but I'd like to my thoughts about the reader APIs.

IMHO, we may want to have or start with two sets of APIs: 1) the APIs to query 
the raw data and 2) the APIs to query the aggregation data.

1) APIs to query the raw data:

We would like to have the APIs to let users zoom into the details about their 
jobs, and give users the freedom to fetch the raw data and do the customized 
process that ATS will not do. For example, Hive/Pig on Tez need this set of 
APIs to get the framework specific data, process it and render it on their on 
web UI. We basically need 2 such APIs.

a. Get a single entity given an ID that uniquely locates the entity in the 
backend (We assume the uniqueness is assured somehow). 
* This API can be extended or split into multiple sub-APIs to get a single 
element of the entity, such as events, metrics and configuration.

b. Search for a set entities that match the given predicates.
* We can start from the predicates that we used in ATS v1 (also for the 
compatibility purpose), but some of them may no longer apply.
* We may want to add more predicates to check the newly added element in v2.
* With more predefined semantics, we can even query entities that belong to 
some container/attempt/application and so on.

2) APIs to query the aggregation data

These are complete new in v2 and are the advantage. With the aggregation, we 
can answer some statistical questions about the job, the user, the queue, the 
flow and the cluster. These APIs are not directing users to the individual 
entities put by the application, but returning statistical data (carried by 
Application|User|Queue|Flow|ClusterEntity). 

a. Get certain level aggregation data given the ID of the concept on that 
level, i.e.,  the job, the user, the queue, the flow and the cluster.

b. Search for the the jobs, the users, the queues, the flows and the clusters 
given predicates.
* For the predicates, we could learn from the examples in hRaven.


> [Storage abstraction] Create backing storage read interface for ATS readers
> ---------------------------------------------------------------------------
>
>                 Key: YARN-3051
>                 URL: https://issues.apache.org/jira/browse/YARN-3051
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: YARN-2928
>            Reporter: Sangjin Lee
>            Assignee: Varun Saxena
>         Attachments: YARN-3051-YARN-2928.003.patch, 
> YARN-3051-YARN-2928.03.patch, YARN-3051.wip.02.YARN-2928.patch, 
> YARN-3051.wip.patch, YARN-3051_temp.patch
>
>
> Per design in YARN-2928, create backing storage read interface that can be 
> implemented by multiple backing storage implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to