[
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15469802#comment-15469802
]
Rohith Sharma K S commented on YARN-5585:
-----------------------------------------
bq. Are we selecting entities whose ID is less than start value, or we're
filtering them out? According to your description fromId = app-5 should return
something like app-6 to 10, right? I think it's very important to clearly
define the exact meaning of "fromId"?
*fromId* is to the users to pass as an query parameter in REST URL similar to
limit. When entities are being retrieved from storage i.e HBase, entities
whose ID is less than start value are given to HBase client. Then HBase client
process this ResultScanner and return entites.
Ex : Assume that *entity-1 entity-2.. entity-10* are stored in HBase in a row.
Current Behavior without fromId :
# When REST call is made to obtaining entities , then out put get it as
*entity-10 entity-9... entity-2, entity-1*.
# When REST call is made along with filter {{limit=5}}, then out put get it as
*entity-10, entity-9... entity-6*. Note that limit is not applied at storage
level. Rather limit is applied on scanned rows i.e HBase ResultScanner gives
*ALL* the rows i.e entities1 to entities-10. And
{{TimelineEntityReader#readEntities}} limit number of rows to be given to user.
After patch i.e fromId as filter :
# When REST call is made along with filter {{limit=5}} and
{{fromIid=entity-6}}, then *HBase it self gives rows which are less than
entity-6* i.e entity-5 to entity-1. It is much more optimization rather that
processing all the rows at HBaseclient i.e at
{{TimelineEntityReader#readEntities}}
Basically to the user, fromId is nothing but starting point for next set of
entities.
bq. Because we're selecting entities starting from a given ID, can we directly
pass in the fromID's key when creating the scan? In this way seems like we
saved one filter? For example, if fromId is not provided, we may want to scan
from cluster!user!flow!flowrun!appId!type, but if fromId is provided, we can
start from cluster!user!flow!flowrun!appId!type!fromId (or the next available
entity)?
This is good point. But as you said in earlier comment that entities are not
stored in-order. It can be like
entites-9,entitis-5,entites-6,entites-2...entities-10. So, IIUC this can not be
achieved
bq. For pagination on containers, why do we need to care about actual creation
time when the entity ids have already been sorted? This said, supporting
paginations for generic timeline entities should not be blocked by YARN-5094?
Any entities with creationTime set will get descending order of entityId. If
creationtime is not set than there result is reverse order i.e ascending order
of entityId. This is because of implementation of
{{TimelineEntitiy#compareTo}}. So, say {{limit=2 and fromId=enitytId-6}} then
from storage rows retrieved are i.e entity-5 to entity-1. And to the user, REST
output get as entity-1 and entity-2 rather than getting entity-5 and entity-4.
This is because of {{TimelineEntityReader#readEntities}} implementation.
YARN-5094 blocks for testing YARN-CONTAINER entities because most of the events
are -1 creation time which always result will be first N number of containers
when fromId is used. I have tested for TEZ application where fromId works right
way.
> [Atsv2] Add a new filter fromId in REST endpoints
> -------------------------------------------------
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: timelinereader
> Reporter: Rohith Sharma K S
> Assignee: Rohith Sharma K S
> Attachments: YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the
> applications. Along with those, it would be good to add new filter i.e fromId
> so that entities can be retrieved after the fromId.
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-10. But to retrieve next 5 apps, it is
> difficult.
> So proposal is to have fromId in the filter like
> *getApps?limit=5&&fromId=app-5* which gives list of apps from app-6 to
> app-10.
> This is very useful for pagination in web UI.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]