[
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15512079#comment-15512079
]
Vrushali C edited comment on YARN-5585 at 9/22/16 4:23 AM:
-----------------------------------------------------------
I have been thinking more on this. I think if there is a concern about having
the same entity data in two tables, what we could do is, set a TTL (time to
live) on the cells in the auxiliary table. That way, for some period of time we
store data in two places but then it gets cleaned up.
For example, if the Tez UI queries for data in the auxiliary table for a job
that ran 1 year back, then say, it does not exist anymore in the auxiliary
table since it got cleaned up by hbase. Now the Tez UI can try querying the
regular table. Or the auxiliary REST api call can take a parameter that says if
data is not found in auxiliary table, please query the regular entity table and
the rest call would perhaps then take a little longer to return. Since we are
querying for something that ran 1 year back, I believe we can wait for an extra
moment for the call to return.
This way, we store data in two tables for a brief time period, rely on hbase to
clean up cells as per their TTL and provide a way for frameworks to store/query
their data in harmony with timeline service storage.
was (Author: vrushalic):
I have been thinking more on this. I think if there is a concern about having
the same entity data in two tables, what we could do is, set a TTL (time to
live) on the cells in the auxiliary table. That way, for some period of time we
store data in two places but then it gets cleaned up.
For example, if Tez UI queries for data in the auxiliary table for a job that
ran 1 year back, then say, it does not exist anymore in the auxiliary table
since it got cleaned up by hbase. Now the Tez UI can try querying the regular
table. Or the auxiliary REST api call can take a parameter that says if data is
not found in auxiliary table, please query the regular entity table and the
rest call would perhaps then take a little longer to return. Since we are
querying for something that ran 1 year back, I believe we can wait for an extra
moment for the call to return.
This way, we store data in two tables for a brief time period, rely on hbase to
clean up cells as per their TTL and provide a way for frameworks to store/query
their data in harmony with timeline service storage.
> [Atsv2] Add a new filter fromId in REST endpoints
> -------------------------------------------------
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: timelinereader
> Reporter: Rohith Sharma K S
> Assignee: Rohith Sharma K S
> Priority: Critical
> Attachments: YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the
> applications. Along with those, it would be good to add new filter i.e fromId
> so that entities can be retrieved after the fromId.
> Current Behavior : Default limit is set to 100. If there are 1000 entities
> then REST call gives first/last 100 entities. How to retrieve next set of 100
> entities i.e 101 to 200 OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is
> no way to achieve this.
> So proposal is to have fromId in the filter like
> *getApps?limit=5&&fromId=app-5* which gives list of apps from app-6 to
> app-10.
> Since ATS is targeting large number of entities storage, it is very common
> use case to get next set of entities using fromId rather than querying all
> the entites. This is very useful for pagination in web UI.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]