[
https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14593627#comment-14593627
]
Zhijie Shen commented on YARN-3051:
-----------------------------------
First of all, I'd like to say it's not the finalized the reader API, but the
one we are okay to start with: two types of query, and the set of essential
parameters, which focus on tuning what entities to return. We can definitely
iterate over the APIs to add more parameters to trim the results, and to
control sub-entity information.
bq. We had decided that user may not need to retrieve all the configs and
metrics and hence we should have a parameter to indicate that ? A list of
metrics and confs user wants to retrieve ? For both the APIs'. I had included
this in the patch I had made. Do we need it ?
Yeah, we could have these parameters, but I'm wondering the efficient way to
retrieve part of the configs/metrics in a huge set. For example, if I'm
interested in all the mapred configs of my job. What should I do? Enumerate all
the mapred configs I want to retrieve in the query parameter is a nightmare. My
immediate thought about it is regex, but I don't want to include this parameter
into the original version until we're clear about how to specify it.
bq. Shouldn't we have metrics filters to support queries like fetch entities
which have a metric > a certain value. In the patch I had included support for
relational operators.
We should. See my TODO comment. The problem again is that it's not a simple
predicate. How do we want to abstract and support it? You give the example ">",
but we need to take care of "<", "=", "!=", "like" and so on.
bq. We do not need flowId and flowRunId to get an entity. But it can still be
an optional argument so that we avoid peek into the table which gets them based
on cluster and appid. Thoughts ?
Yeah, it makes sense to. Image we have the web UI, and user is directed from
flow page to the app page and move on, he's going to carry the flow
information. If user can provide flowId//flowRunId, we can more efficiently
locate the entity. We can have the two params, make them optional. Also, it
seems that I've missed userId too. It's the first piece that the consists of
the entity key. IMHO, we should have it and make it mandatory to avoid scan
through the whole key space. And It should be reasonable that we take the
requester as the user and only search into his entity space, but not others.
bq. Will we fetch entities across entityTypes ? We also have events as filters
here. They may not match across entity types. Thoughts ?
Good point, let's go with single entityType first.
bq. As per our previous discussion I had also included metrics time windows in
the APIs'. This may aid in plotting graphs for long running apps. Thoughts ?
This seems to belong to (contents to retrieve), and not difficult to enforce
the window. We can add this into the param list. One question is whether we
want to specify the window per metric or for all metrics. Personally, I prefer
to defer it together with fetching particular configs/metrics in a later
enhancement about (contents to retrieve). How do you think?
I've updated the Reader interface accordingly.
> [Storage abstraction] Create backing storage read interface for ATS readers
> ---------------------------------------------------------------------------
>
> Key: YARN-3051
> URL: https://issues.apache.org/jira/browse/YARN-3051
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: timelineserver
> Affects Versions: YARN-2928
> Reporter: Sangjin Lee
> Assignee: Varun Saxena
> Attachments: YARN-3051-YARN-2928.003.patch,
> YARN-3051-YARN-2928.03.patch, YARN-3051-YARN-2928.04.patch,
> YARN-3051.Reader_API.patch, YARN-3051.Reader_API_1.patch,
> YARN-3051.wip.02.YARN-2928.patch, YARN-3051.wip.patch, YARN-3051_temp.patch
>
>
> Per design in YARN-2928, create backing storage read interface that can be
> implemented by multiple backing storage implementations.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)