[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15472669#comment-15472669
 ] 

Varun Saxena commented on YARN-5585:
------------------------------------

Another solution which comes to mind is that we keep another table, say 
EntityCreationTable with row key 
{{cluster!user!flow!flowrun!app!entitytype!reverse entity creation 
time!entityid}}. So we will make an entry into this table  whenever created 
time is reported for the entity. The real data would still reside in the main 
entity table. Entities in this table will be sorted descendingly 

And as the goal is to achieve pagination, we can introduce something like 
fromCreatedTime query param.
The pagination use case will be to get chunks of data. Let us say we want first 
10 records. In this case, we will send a query with limit of 10 and no 
fromCreatedTime query param.
So when a query arrives and fromCreatedTime is not there, we start reading from 
this table with start row as {{cluster!user!flow!flowrun!app!entitytype!}} upto 
number of records specified by {{limit}} query param.  We can break as soon as 
10 records are found and need not parse through all rows as is done right now 
for entity table.

Now if what we want is to return only the default view of the entity i.e. 
entity id, type and created time we can return a result set straight away. 
Otherwise, to get more detailed data, we need to get hold of first entity and 
last entity retrieved from EntityCreationTable and make a scan to EntityTable 
with Single Column Value filter with a created time range (the code is already 
there for this). 

This would still require full scan within the scope of entity type but most 
results will be removed by HBase at server end itself because of created time 
range filter. Which approach will be better. Directly dipping into Entity Table 
or querying 2 tables depends entirely on how many records we have in entity 
table within the scope of that entity type.

Now once, client gets a first 10 records, it can make next query to get record 
11-20 by populating fromCreatedTime with created time of 10th record. Next scan 
in EntityCreationTable can be made on the basis of that. fromId must also be 
used in conjunction with fromCreatedTime though.

For this solution client must not report duplicate created time multiple times.

Also not a 100% sure but a coprocessor can be used for this extra call ? So 
that client is not involved.

> [Atsv2] Add a new filter fromId in REST endpoints
> -------------------------------------------------
>
>                 Key: YARN-5585
>                 URL: https://issues.apache.org/jira/browse/YARN-5585
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelinereader
>            Reporter: Rohith Sharma K S
>            Assignee: Rohith Sharma K S
>         Attachments: YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-10. But to retrieve next 5 apps, it is 
> difficult.
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&&fromId=app-5* which gives list of apps from app-6 to 
> app-10. 
> This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to