[
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15455257#comment-15455257
]
Varun Saxena edited comment on YARN-5585 at 9/1/16 12:40 PM:
-------------------------------------------------------------
So I though a little bit over it and I think there is a solution possible for
fetching apps within a cluster without much of performance impact. Because this
seems to be your use case.
What we can do is that we can get the required App IDs' from App to flow table
first as app ids' in this table are sorted and extract applicable flows from
there. And then get data from the application table using these unique flows to
get more specific information about the apps. We have something called
MultiRowRangeFilter in HBase which can help us specify multiple row key ranges.
We can only return those apps which we found from app to flow table.
And from a performance viewpoint we can assume there will always be a
reasonable limit specified.
_Example:_
Assume, in a cluster we have applications from application_1111111_0001 to
application_1111111_0034 (running or completed).
These apps will be stored in a descending order in app to flow table.
Let us say you want to get latest 10 apps (i.e. limit in your query is 10).
What we can do is get first 10 apps from app to flow table i.e.
application_1111111_0034 to application_1111111_0025. We can use PageFilter to
return only first 10 records. This is the result set we can return back.
Assume application IDs' ending with _0034, _0031 and _0027 belong to flow1 and
rest to flow2. We can then use this info to query app table.
So to get detailed info for these 10 apps in a single shot from application
table, what we can do is as under :
* Create a MultiRowRangeFilter
* For flow1. add start row as {{cluster!user!flow1!application_1111111_0034}}
and stop row as {{cluster!user!flow1!application_1111111_0027}}. We can make
stop row inclusive. We can then add this start/stop row pair into the multi row
range filter created.
* And for flow2, start row can be
{{cluster!user!flow2!application_1111111_0033}} and stop row as
{{cluster!user!flow2!application_1111111_0024}}. We can then add this
start/stop row pair into the multi row range filter created.
This would be slower than getting all apps when flow or flow run is specified
but would be faster than doing full table scan of application table, especially
when it grows large.
Maybe I can raise a separate JIRA for this and handle it there if this is a
real use case.
was (Author: varun_saxena):
So I though a little bit over it and I think there is a solution possible for
fetching apps within a cluster without much of performance impact. Because this
seems to be your use case.
What we can do is that we can get the required App IDs' from App to flow table
first as app ids' in this table are sorted and extract applicable flows from
there. And then get data from the application table using these unique flows to
get more specific information about the apps. Say pass a flow to appids' map.
We have something called MultiRowRangeFilter in HBase which can help us specify
multiple row key ranges.
We can only return those apps which we found from app to flow table.
And from a performance viewpoint we can assume there will always be a
reasonable limit specified.
_Example:_
Assume, in a cluster we have applications from application_1111111_0001 to
application_1111111_0034 (running or completed).
These apps will be stored in a descending order in app to flow table.
Let us say you want to get latest 10 apps (i.e. limit in your query is 10).
What we can do is get first 10 apps from app to flow table i.e.
application_1111111_0034 to application_1111111_0025. We can use PageFilter to
return only first 10 records. This is the result set we can return back.
Assume application IDs' ending with _0034, _0031 and _0027 belong to flow1 and
rest to flow2. We can then use this info to query app table.
So to get detailed info for these 10 apps in a single shot from application
table, what we can do is as under :
* Create a MultiRowRangeFilter
* For flow1. add start row as {{cluster!user!flow1!application_1111111_0034}}
and stop row as {{cluster!user!flow1!application_1111111_0027}}. We can make
stop row inclusive. We can then add this start/stop row pair into the multi row
range filter created.
* And for flow2, start row can be
{{cluster!user!flow2!application_1111111_0033}} and stop row as
{{cluster!user!flow2!application_1111111_0024}}. We can then add this
start/stop row pair into the multi row range filter created.
This would be slower than getting all apps when flow or flow run is specified
but would be faster than doing full table scan of application table, especially
when it grows large.
Maybe I can raise a separate JIRA for this and handle it there if this is a
real use case.
> [Atsv2] Add a new filter fromId in REST endpoints
> -------------------------------------------------
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: timelinereader
> Reporter: Rohith Sharma K S
> Assignee: Rohith Sharma K S
>
> TimelineReader REST API's provides lot of filters to retrieve the
> applications. Along with those, it would be good to add new filter i.e fromId
> so that entities can be retrieved after the fromId.
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-10. But to retrieve next 5 apps, it is
> difficult.
> So proposal is to have fromId in the filter like
> *getApps?limit=5&&fromId=app-5* which gives list of apps from app-6 to
> app-10.
> This is very useful for pagination in web UI.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]