[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767944#comment-15767944
 ] 

Sangjin Lee commented on YARN-5585:
-----------------------------------

Sorry for chiming in late on the discussion. I haven't reviewed the patch yet, 
but just to state my opinion,

I'm fine with passing {{fromId}} with the prefix and id concatenated with a 
colon (":") for multi-entity queries. I'm also OK with using only the prefix 
portion for such queries although I don't expect this to be an important use 
case.

As for only specifying only the entity id for {{fromId}}, I don't know that 
this is important at all. Pagination requests would be coming mostly from 
non-human clients (e.g. UI, scripted REST clients, etc.), and as such they 
always have both pieces of information. It would be strange for them not to 
provide the id prefix. I am comfortable with just throwing an exception if the 
id prefix is missing in {{fromId}}.

For queries by entity id (i.e. single entity queries), as noted there are 
really 2 distinct use cases: (1) queries with both id prefix and entity id 
(which would be mostly coming from non-human clients), and (2) queries with 
only entity id. (1) is not ambiguous at all.

(2) can be further divided into 2 cases: (2-1) there was no id prefix written 
to the storage (i.e. default prefix = 0), and (2-2) the client (most likely 
human) simply does not know the id prefix.

Long story short, I think we can support (2) with Varun's suggestion:
{quote}
I am wondering that can we utilize setting the start and stop row in Scan for 
this. Reason being we know idprefix can have a range of 0 to max value of long. 
Thus, our start row can be cluster!user!flow!runid!appid!entitytype!0!entityid 
and as stop row in not inclusive, we can call 
TimelineStorageUtils#calculateTheClosestNextRowKeyForPrefix for 
cluster!user!flow!runid!appid!entitytype!LONG_MAX!entityid. This would mean 
that typically only one row will be scanned. We can anyways break out of the 
loop as soon as first row (which will be true for almost all the cases) is 
found. We can use PageFilter of 1 to keep the Scan and result retrieved via it 
as small. Thoughts ?
{quote}

If entity prefix was not specified, we could do this range scan. The only point 
to clarify then is whether to stop at the first result or detect the case where 
there are multiple rows and return an error. I am leaning slightly towards the 
former with the assumption that it should be truly rare that there are multiple 
rows for the same entity id (otherwise it would be a bug in the write path) and 
also for performance reasons.

For those cases where there was no id prefix (i.e. default) written, clients 
should still set the id prefix (to 0) so that it becomes the first use case (1).

I'll go over the patch and post my feedback today. Thanks.

> [Atsv2] Reader side changes for entity prefix and support for pagination via 
> additional filters
> -----------------------------------------------------------------------------------------------
>
>                 Key: YARN-5585
>                 URL: https://issues.apache.org/jira/browse/YARN-5585
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelinereader
>            Reporter: Rohith Sharma K S
>            Assignee: Rohith Sharma K S
>            Priority: Critical
>              Labels: yarn-5355-merge-blocker
>         Attachments: 0001-YARN-5585.patch, YARN-5585-YARN-5355.0001.patch, 
> YARN-5585-YARN-5355.0002.patch, YARN-5585-YARN-5355.0003.patch, 
> YARN-5585-workaround.patch, YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities 
> then REST call gives first/last 100 entities. How to retrieve next set of 100 
> entities i.e 101 to 200 OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is 
> no way to achieve this. 
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&&fromId=app-5* which gives list of apps from app-6 to 
> app-10. 
> Since ATS is targeting large number of entities storage, it is very common 
> use case to get next set of entities using fromId rather than querying all 
> the entites. This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to