[
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15550930#comment-15550930
]
Rohith Sharma K S commented on YARN-5585:
-----------------------------------------
bq. We also need to be *crystal clear* that timeline clients *must* provide the
same prefix for all subsequent updates of the same entity. I cannot stress that
point enough. Rohith, could you confirm that it is not an issue with Tez to
provide the created time for any subsequent updates for Tez entities?
This is very important point for TimelineClient users who wants to use
prefixId. Even though I am in minority side of introducing *optional* prefixId,
convinced myself to go ahead with it because of at least
optionality(flexibility) is better than predefined storage specific sort order.
And knowing the issue is with storage layer which trying to solve popping the
issue up to API by providing an optionality prefix, which exposing flaw in API
so that user can mess up the storage which result in inconsistent data while
retrieving.
I had offline talk with one of the Tez developer, and he is fine to provide
prefixId. Some concerns expressed by him are, Firstly about multi JVM which
makes application programmer to define new protocol for transferring prefixId.
Secondly, what if users misses providing an prefixId in subsequent updates.?
This will makes storage mess up with data stored in 2 different entry or it can
be multiple entry.
bq. I'm also realizing that we might have a bug in how we deal with entity
id's. I would have thought that we store the entities in the reverse entity id
order, but it appears that the entity id is encoded into the row key as is
(EntityRowKey). Am I reading that right? If so, this is a bug to fix.
Sorry I could not get much. Could you explain bit elaborately. Do you mean
reversing the only entityId i.e if entityId is "12345" then "54321" OR row-key
itself?
bq. One other thing to deal with is the query by id. There, we need to be able
to distinguish the case where the data do not have the prefix to begin with and
that where data do. Ideally we would simply use the row key explicitly in the
case of data that don't have the prefix to begin with. For those that do have
the prefix, we cannot use the row key to fetch the row so we need to do
something different. I don't think this was done in the current patch, but this
is TBD.
I was thinking to use same REST API for both by using SingleColumnFilter. One
cons I see is table scan for all the entityType i.e reflect in read performance.
Other comments, let me handle it. And also, I will create patch on YARN-5355
branch.
> [Atsv2] Add a new filter fromId in REST endpoints
> -------------------------------------------------
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: timelinereader
> Reporter: Rohith Sharma K S
> Assignee: Rohith Sharma K S
> Priority: Critical
> Attachments: 0001-YARN-5585.patch, YARN-5585-workaround.patch,
> YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the
> applications. Along with those, it would be good to add new filter i.e fromId
> so that entities can be retrieved after the fromId.
> Current Behavior : Default limit is set to 100. If there are 1000 entities
> then REST call gives first/last 100 entities. How to retrieve next set of 100
> entities i.e 101 to 200 OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is
> no way to achieve this.
> So proposal is to have fromId in the filter like
> *getApps?limit=5&&fromId=app-5* which gives list of apps from app-6 to
> app-10.
> Since ATS is targeting large number of entities storage, it is very common
> use case to get next set of entities using fromId rather than querying all
> the entites. This is very useful for pagination in web UI.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]