Zhijie Shen commented on YARN-3051:

First of all, I'd like to say it's not the finalized the reader API, but the 
one we are okay to start with: two types of query, and the set of essential 
parameters, which focus on tuning what entities to return. We can definitely 
iterate over the APIs to add more parameters to trim the results, and to 
control sub-entity information.

bq. We had decided that user may not need to retrieve all the configs and 
metrics and hence we should have a parameter to indicate that ? A list of 
metrics and confs user wants to retrieve ? For both the APIs'. I had included 
this in the patch I had made. Do we need it ?

Yeah, we could have these parameters, but I'm wondering the efficient way to 
retrieve part of the configs/metrics in a huge set. For example, if I'm 
interested in all the mapred configs of my job. What should I do? Enumerate all 
the mapred configs I want to retrieve in the query parameter is a nightmare. My 
immediate thought about it is regex, but I don't want to include this parameter 
into the original version until we're clear about how to specify it.

bq. Shouldn't we have metrics filters to support queries like fetch entities 
which have a metric > a certain value. In the patch I had included support for 
relational operators.

We should. See my TODO comment. The problem again is that it's not a simple 
predicate. How do we want to abstract and support it? You give the example ">", 
but we need to take care of "<", "=", "!=", "like" and so on.

bq. We do not need flowId and flowRunId to get an entity. But it can still be 
an optional argument so that we avoid peek into the table which gets them based 
on cluster and appid. Thoughts ?

Yeah, it makes sense to. Image we have the web UI, and user is directed from 
flow page to the app page and move on, he's going to carry the flow 
information. If user can provide flowId//flowRunId, we can more efficiently 
locate the entity. We can have the two params, make them optional. Also, it 
seems that I've missed userId too. It's the first piece that the consists of 
the entity key. IMHO, we should have it and make it mandatory to avoid scan 
through the whole key space. And It should be reasonable that we take the 
requester as the user and only search into his entity space, but not others.

bq. Will we fetch entities across entityTypes ? We also have events as filters 
here. They may not match across entity types. Thoughts ?

Good point, let's go with single entityType first.

bq. As per our previous discussion I had also included metrics time windows in 
the APIs'. This may aid in plotting graphs for long running apps. Thoughts ?

This seems to belong to (contents to retrieve), and not difficult to enforce 
the window. We can add this into the param list. One question is whether we 
want to specify the window per metric or for all metrics. Personally, I prefer 
to defer it together with fetching particular configs/metrics in a later 
enhancement about (contents to retrieve). How do you think?

I've updated the Reader interface accordingly.

> [Storage abstraction] Create backing storage read interface for ATS readers
> ---------------------------------------------------------------------------
>                 Key: YARN-3051
>                 URL: https://issues.apache.org/jira/browse/YARN-3051
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: YARN-2928
>            Reporter: Sangjin Lee
>            Assignee: Varun Saxena
>         Attachments: YARN-3051-YARN-2928.003.patch, 
> YARN-3051-YARN-2928.03.patch, YARN-3051-YARN-2928.04.patch, 
> YARN-3051.Reader_API.patch, YARN-3051.Reader_API_1.patch, 
> YARN-3051.wip.02.YARN-2928.patch, YARN-3051.wip.patch, YARN-3051_temp.patch
> Per design in YARN-2928, create backing storage read interface that can be 
> implemented by multiple backing storage implementations.

This message was sent by Atlassian JIRA

Reply via email to