Sangjin Lee commented on YARN-4074:

Thanks for your comments [~varun_saxena]!

bq. This I guess is to retrieve entities by a certain time range. I havent 
added this in REST related JIRA and dont see support even here. Will handle it 
after PoC I guess. Correct ?

The flow activity table is a time-based set of data. The timestamp (day marker 
really) is there to order the activity in time. It is feasible to query the 
flow activity table based on time (e.g. "give me all the activity in the past 3 
days"). I didn't get around to it, but it should be pretty straightforward to 
support that after the POC. I'll file a JIRA for adding that support.

bq. Also I had added user as an optional query param in REST API code. I think 
querying by user wont really be a good idea looking at the row key. Will remove 

Yes, the way the data is laid out, cluster + user will not be an efficient 
query, as time is the component that gets in before the user.

The major code added in this patch is about the different readers based on 
table and a factory. It looks fine. However, number of parameters to the 
methods are quite a lot just like Reader API. As you mentioned elsewhere, maybe 
I can refactor this later. We should club together some things into logical 
things like context, filters, etc. That will reduce number of params.

That is spot on. I really didn't like having to repeat the long list of 
arguments. But since you're looking into a better way of capturing the filters 
and predicates, I'm not really changing things as part of this JIRA. Hope that 
is consistent with your understanding.

In the factory class, we have a sequence of if-else statements. Although its a 
matter of perspective, sequence of if-else look a little inelegant. But we may 
not have too many great options here. Thought of enums i.e. having create 
methods with implementation tied to each enum but entity type enum is not HBase 
specific. Any other option ? I guess if-else should be fine for now because not 
too many tables should be added in future, if any.

I agree. I wanted to use the switch-case statements, but the main issue was 
that the input is string, not enums. If it were enums, it could have been 

In xxxEntityReader classes, maybe createTable should be renamed to getTable 
because we are not really creating any table here. We are just getting/creating 
a table object. 
Also if I am not wrong, no need to create this object again and again as well. 
All it really holds is static information such as table name and conf.

Those are good suggestions. Yes, the {{BaseTable}} instances are thread-safe, 
and I think they can be reused. I'll update the patch to make those changes.

> [timeline reader] implement support for querying for flows and flow runs
> ------------------------------------------------------------------------
>                 Key: YARN-4074
>                 URL: https://issues.apache.org/jira/browse/YARN-4074
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: YARN-2928
>            Reporter: Sangjin Lee
>            Assignee: Sangjin Lee
>         Attachments: YARN-4074-YARN-2928.POC.001.patch, 
> YARN-4074-YARN-2928.POC.002.patch, YARN-4074-YARN-2928.POC.003.patch, 
> YARN-4074-YARN-2928.POC.004.patch, YARN-4074-YARN-2928.POC.005.patch
> Implement support for querying for flows and flow runs.
> We should be able to query for the most recent N flows, etc.
> This includes changes to the {{TimelineReader}} API if necessary, as well as 
> implementation of the API.

This message was sent by Atlassian JIRA

Reply via email to