[ https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14593627#comment-14593627 ]
Zhijie Shen commented on YARN-3051: ----------------------------------- First of all, I'd like to say it's not the finalized the reader API, but the one we are okay to start with: two types of query, and the set of essential parameters, which focus on tuning what entities to return. We can definitely iterate over the APIs to add more parameters to trim the results, and to control sub-entity information. bq. We had decided that user may not need to retrieve all the configs and metrics and hence we should have a parameter to indicate that ? A list of metrics and confs user wants to retrieve ? For both the APIs'. I had included this in the patch I had made. Do we need it ? Yeah, we could have these parameters, but I'm wondering the efficient way to retrieve part of the configs/metrics in a huge set. For example, if I'm interested in all the mapred configs of my job. What should I do? Enumerate all the mapred configs I want to retrieve in the query parameter is a nightmare. My immediate thought about it is regex, but I don't want to include this parameter into the original version until we're clear about how to specify it. bq. Shouldn't we have metrics filters to support queries like fetch entities which have a metric > a certain value. In the patch I had included support for relational operators. We should. See my TODO comment. The problem again is that it's not a simple predicate. How do we want to abstract and support it? You give the example ">", but we need to take care of "<", "=", "!=", "like" and so on. bq. We do not need flowId and flowRunId to get an entity. But it can still be an optional argument so that we avoid peek into the table which gets them based on cluster and appid. Thoughts ? Yeah, it makes sense to. Image we have the web UI, and user is directed from flow page to the app page and move on, he's going to carry the flow information. If user can provide flowId//flowRunId, we can more efficiently locate the entity. We can have the two params, make them optional. Also, it seems that I've missed userId too. It's the first piece that the consists of the entity key. IMHO, we should have it and make it mandatory to avoid scan through the whole key space. And It should be reasonable that we take the requester as the user and only search into his entity space, but not others. bq. Will we fetch entities across entityTypes ? We also have events as filters here. They may not match across entity types. Thoughts ? Good point, let's go with single entityType first. bq. As per our previous discussion I had also included metrics time windows in the APIs'. This may aid in plotting graphs for long running apps. Thoughts ? This seems to belong to (contents to retrieve), and not difficult to enforce the window. We can add this into the param list. One question is whether we want to specify the window per metric or for all metrics. Personally, I prefer to defer it together with fetching particular configs/metrics in a later enhancement about (contents to retrieve). How do you think? I've updated the Reader interface accordingly. > [Storage abstraction] Create backing storage read interface for ATS readers > --------------------------------------------------------------------------- > > Key: YARN-3051 > URL: https://issues.apache.org/jira/browse/YARN-3051 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver > Affects Versions: YARN-2928 > Reporter: Sangjin Lee > Assignee: Varun Saxena > Attachments: YARN-3051-YARN-2928.003.patch, > YARN-3051-YARN-2928.03.patch, YARN-3051-YARN-2928.04.patch, > YARN-3051.Reader_API.patch, YARN-3051.Reader_API_1.patch, > YARN-3051.wip.02.YARN-2928.patch, YARN-3051.wip.patch, YARN-3051_temp.patch > > > Per design in YARN-2928, create backing storage read interface that can be > implemented by multiple backing storage implementations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)