[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15472083#comment-15472083
 ] 

Li Lu commented on YARN-5585:
-----------------------------

I think we're overcomplicating the problem here... I believe the general use 
case of this JIRA is mostly on pagination: given an uniquely defined type of 
entities in one application, if the total number of entities is greater than 
the given limit, can we provide an API to allow fetching data in multiple 
batches. So right now we have <entity_001>, <entity_002>, ..., <entity_100>, 
and limit = 10. What we want is initially we fetch <entity_001> to 
<entity_010>, then given fromId = entity_010, we fetch <entity_011> to 
<entity_020>, and so on and so forth. According to Rohith's use case, I think 
it's totally fine to say that all entities are ordered by their Ids 
lexicographically (especially for entities with proper padding on numbers like 
container id). Actually, any consistent order will do the work for pagination, 
the only problem is how to make it makes sense to the users. 

The real problem here is we need to return everything in an order sorted by 
their creation time, which seems to be quite hard in our current data model. 
This was pretty easy in ATS v1, where creation time is baked in the row key for 
each entity. I remember there were some discussions about this a while ago, but 
the general conclusion was that we mainly rely on the use cases themselves to 
guarantee consistency between creation time and entity id. To me, the potential 
problem of sorting entities according to their creation time to implement 
pagination is that we have to firstly fetch _all_ of them from HBase to form 
the order, which really kills the most advantage of pagination. 

An ID encoder/decoder will be very helpful to this use case. However, having 
the application write the encode/decode process seems to be introducing more 
load to application programmers. It also introduces extra work for deployments 
since cluster operators need to handle third-party plugins. Can we provide 
several "SORT BY" options for timeline entity types, so that we store their ids 
accordingly? 

> [Atsv2] Add a new filter fromId in REST endpoints
> -------------------------------------------------
>
>                 Key: YARN-5585
>                 URL: https://issues.apache.org/jira/browse/YARN-5585
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelinereader
>            Reporter: Rohith Sharma K S
>            Assignee: Rohith Sharma K S
>         Attachments: YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-10. But to retrieve next 5 apps, it is 
> difficult.
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&&fromId=app-5* which gives list of apps from app-6 to 
> app-10. 
> This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to