[ https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15546179#comment-15546179 ]
Rohith Sharma K S commented on YARN-5585: ----------------------------------------- Thanks Varun for quick review.. bq. Intention behind having ID_PREFIX in EntityColumn ? According to me, we need not store prefix in the column. Is it because we want to read it back and send it to client ? Given your point-5 is valid, id_prefix is need to be stored in column and give it back to user while reading. Basically intention is user can provide fromEntityPrefix as filter. bq. No need of GenericEntityReader#calculateTheClosestNextRowKeyForPrefix. Scan#setRowPrefixFilter will do it for you. We should call it the same way as was done previously. This is an optimization while scanning rows. This makes directly seeking to required row-key and start scanning. Say, the row-keys are stored in below order. Consider limit is 2 and prefix is unknown then scanning start from row-key beginning. After fetching 2 rows, user knows prefix is 2 , and gives fromEntityPrefix as 2 for retrieving next batch. Then reader need not to scan rows from beginning rather directly start scanning row-key prefixed with 2. And stop row need to be calculated for entityType level i.e till prefix 4. {code} cluster!user!flow!flowrun!app!entitytype!1!{entityid} cluster!user!flow!flowrun!app!entitytype!2!{entityid} cluster!user!flow!flowrun!app!entitytype!3!{entityid} cluster!user!flow!flowrun!app!entitytype!4!{entityid} {code} bq. As entity ID prefix is a long, EntityRowKeyConverter#SEGMENT_SIZES should have new segment as Bytes.SIZEOF_LONG. It is currently given as VARIABLE_SIZE. Same change in TestRowKeys. I purposefully used VARIABLE_SIZE because prefix can be empty bytes also when there is no prefix is specified. If we use Bytes.SIZEOF_LONG, then decoding always expect that there are some bytes for prefix, but ideally its not. Whenever prefix is not specified then do not want to use any default value which takes an extra byte for storage. bq. We will have to change Get to Scan with a SingleColumnValueFilter accordingly. This is open point in attached patch, I will look for feasibility to make use same REST end point for prefix supported entities. > [Atsv2] Add a new filter fromId in REST endpoints > ------------------------------------------------- > > Key: YARN-5585 > URL: https://issues.apache.org/jira/browse/YARN-5585 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelinereader > Reporter: Rohith Sharma K S > Assignee: Rohith Sharma K S > Priority: Critical > Attachments: 0001-YARN-5585.patch, YARN-5585-workaround.patch, > YARN-5585.v0.patch > > > TimelineReader REST API's provides lot of filters to retrieve the > applications. Along with those, it would be good to add new filter i.e fromId > so that entities can be retrieved after the fromId. > Current Behavior : Default limit is set to 100. If there are 1000 entities > then REST call gives first/last 100 entities. How to retrieve next set of 100 > entities i.e 101 to 200 OR 900 to 801? > Example : If applications are stored database, app-1 app-2 ... app-10. > *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is > no way to achieve this. > So proposal is to have fromId in the filter like > *getApps?limit=5&&fromId=app-5* which gives list of apps from app-6 to > app-10. > Since ATS is targeting large number of entities storage, it is very common > use case to get next set of entities using fromId rather than querying all > the entites. This is very useful for pagination in web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org