[ 
https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14589036#comment-14589036
 ] 

Sangjin Lee commented on YARN-3051:
-----------------------------------

Sorry it has taken me a while to chime in on this JIRA. I've just gone over the 
recent comments, and also skimmed through the latest patch. BTW, the latest 
patch doesn't seem to apply cleanly (conflicts on {{yarn.cmd}}). 
[~varun_saxena], could you kindly check the latest patch to see if it needs to 
be updated?

I agree with most of the ideas put forward by folks in the comments. I agree 
with [~zjshen] that it'd be desirable to have more specific APIs for the 
user-oriented side of the code and have bit more generic (for lack of a better 
term) APIs on the side of the storage interaction (namely the 
{{TimelineReader}} interface in its current form).

The goals of the {{TimelineReader}} API is, first, it should be 
generic/flexible enough to accommodate a wide range of queries being asked, 
including the current queries as well as possible future queries, and second, 
it should help the storage implementations translate them into efficient 
queries onto the storage itself.

One idea that may help in this regard is to create further coarse-grained 
concepts and use them in the {{TimelineReader}} API. It's already doing that to 
some extent, and we should push that some more. For instance, it might be 
helpful to create *{{Context}}*. The unique context for most of the queries 
would involve the cluster id and the app id. So we can make cluster id and the 
app id part of the {{Context}} object and have {{TimelineReader}} deal with 
{{Context}} instead of enumerating things like cluster id explicitly in its 
methods.

Similarly, we might want to define *predicates and/or filters*, and use them in 
the {{TimelineReader}} API. In essence, one way to look at it is that a query 
onto the storage is really (context) + (predicate/filters) + (contents to 
retrieve). Then we could consolidate arguments into these coarse-grained things.

Also, for the context, I don't think we need to require things like flow id or 
flow run id. The storage should be able to define the context and locate 
entities only with cluster id and the app id.

> [Storage abstraction] Create backing storage read interface for ATS readers
> ---------------------------------------------------------------------------
>
>                 Key: YARN-3051
>                 URL: https://issues.apache.org/jira/browse/YARN-3051
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: YARN-2928
>            Reporter: Sangjin Lee
>            Assignee: Varun Saxena
>         Attachments: YARN-3051-YARN-2928.003.patch, 
> YARN-3051-YARN-2928.03.patch, YARN-3051-YARN-2928.04.patch, 
> YARN-3051.wip.02.YARN-2928.patch, YARN-3051.wip.patch, YARN-3051_temp.patch
>
>
> Per design in YARN-2928, create backing storage read interface that can be 
> implemented by multiple backing storage implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to