[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15546179#comment-15546179
 ] 

Rohith Sharma K S commented on YARN-5585:
-----------------------------------------

Thanks Varun for quick review.. 

bq. Intention behind having ID_PREFIX in EntityColumn ? According to me, we 
need not store prefix in the column. Is it because we want to read it back and 
send it to client ?
Given your point-5 is valid, id_prefix is need to be stored in column and give 
it back to user while reading. Basically intention is user can provide 
fromEntityPrefix as filter. 


bq. No need of GenericEntityReader#calculateTheClosestNextRowKeyForPrefix. 
Scan#setRowPrefixFilter will do it for you. We should call it the same way as 
was done previously.
This is an optimization while scanning rows. This makes directly seeking to 
required row-key and start scanning. Say, the row-keys are stored in below 
order. Consider limit is 2 and prefix is unknown then scanning start from 
row-key beginning. After fetching 2 rows, user knows prefix is 2 , and gives 
fromEntityPrefix as 2 for retrieving next batch. Then reader need not to scan 
rows from beginning rather directly start scanning row-key prefixed with 2. And 
stop row need to be calculated for entityType level i.e till prefix 4.
{code}
cluster!user!flow!flowrun!app!entitytype!1!{entityid}
cluster!user!flow!flowrun!app!entitytype!2!{entityid}
cluster!user!flow!flowrun!app!entitytype!3!{entityid}
cluster!user!flow!flowrun!app!entitytype!4!{entityid}
{code}
bq. As entity ID prefix is a long, EntityRowKeyConverter#SEGMENT_SIZES should 
have new segment as Bytes.SIZEOF_LONG. It is currently given as VARIABLE_SIZE. 
Same change in TestRowKeys.
I purposefully used VARIABLE_SIZE because prefix can be empty bytes also when 
there is no prefix is specified. If we use Bytes.SIZEOF_LONG, then decoding 
always expect that there are some bytes for prefix, but ideally its not.  
Whenever prefix is not specified then do not want to use any default value 
which takes an extra byte for storage. 

bq. We will have to change Get to Scan with a SingleColumnValueFilter 
accordingly.
This is open point in attached patch, I will  look for feasibility to make use 
same  REST end point for prefix supported entities. 

> [Atsv2] Add a new filter fromId in REST endpoints
> -------------------------------------------------
>
>                 Key: YARN-5585
>                 URL: https://issues.apache.org/jira/browse/YARN-5585
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelinereader
>            Reporter: Rohith Sharma K S
>            Assignee: Rohith Sharma K S
>            Priority: Critical
>         Attachments: 0001-YARN-5585.patch, YARN-5585-workaround.patch, 
> YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities 
> then REST call gives first/last 100 entities. How to retrieve next set of 100 
> entities i.e 101 to 200 OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is 
> no way to achieve this. 
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&&fromId=app-5* which gives list of apps from app-6 to 
> app-10. 
> Since ATS is targeting large number of entities storage, it is very common 
> use case to get next set of entities using fromId rather than querying all 
> the entites. This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to