[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-10-06 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15552593#comment-15552593
 ] 

Varun Saxena commented on YARN-5585:


bq. Given entities are sorted in ascending order, at some extent latest fist 
order can be achieve by doing reverse scan. I had tried this for 
yarn-containers and works fine.
Reverse scan would work fine but how do we decide which entity type would need 
it and which won't. By the way we need container IDs' in the reverse order too 
? IIRC, in one of the calls Li mentioned lexicographic order should be fine for 
new Web UI. If required we can have special handling for YARN specific entities 
like App attempts and Containers, just like we have for apps.
No matter what we do, it should be consistent across all entities. We can also 
have another query param to indicate reverse lexicographic order is required.

bq. IIUC, AM can delegate collector address to any of its running containers to 
publish its own data. TimelineClient can not be restricted to only AM.
True. In a secure setup, AM can even pass on the token. The point is we support 
talking to AM only. AM can then delegate its work to anyone. But the concern 
here was that prefix will have to be passed around by AM via a new protocol. So 
if application wants to support delegating work to other processes, it anyways 
needs to open new protocol. So I guess this concern is not specific to prefix. 
Correct ? However, would be useful if you can tell the use case of multiple 
JVMs'. Same DAGs' can be executed by different processes. This would help us 
thing 

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: 0001-YARN-5585.patch, YARN-5585-workaround.patch, 
> YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities 
> then REST call gives first/last 100 entities. How to retrieve next set of 100 
> entities i.e 101 to 200 OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is 
> no way to achieve this. 
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> Since ATS is targeting large number of entities storage, it is very common 
> use case to get next set of entities using fromId rather than querying all 
> the entites. This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-10-06 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15552548#comment-15552548
 ] 

Rohith Sharma K S commented on YARN-5585:
-

bq. Then the entity id order is really the earliest first. Is that what we 
intended?
Given entities are sorted in ascending order, at some extent latest fist order 
can be achieve by doing reverse scan. I had tried this for yarn-containers and 
works fine. 

bq.  It would be the client's responsibility to ensure correct data gets in
Its not about entity data stored in but about number of extra rows get added in 
HBase. Say one time user is published entities with prefix and on next time 
user is published with different prefix or no-prefix for same entity. Since 
there is not validation from server end for each entity updates,  unnecessary 
rows get added up for same entityId. 

bq. Also note that we expect the AM to be the sole client for a given YARN app.
IIUC, AM can delegate collector address to any of its running containers to 
publish its own data. TimelineClient can not be restricted to *only AM*.


> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: 0001-YARN-5585.patch, YARN-5585-workaround.patch, 
> YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities 
> then REST call gives first/last 100 entities. How to retrieve next set of 100 
> entities i.e 101 to 200 OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is 
> no way to achieve this. 
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> Since ATS is targeting large number of entities storage, it is very common 
> use case to get next set of entities using fromId rather than querying all 
> the entites. This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-10-06 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15552438#comment-15552438
 ] 

Sangjin Lee commented on YARN-5585:
---

bq. Entity IDs' can be anything. Even a completely alphabetical sequence can be 
an entity ID. So it will not be possible to define a reverse order for every 
generic entity ID. Is this your question ?

Yes, it was more of a realization on my part on how it behaves. For some 
reason, I thought that we would return the most recent entities first (i.e. 
reverse order of the entity id's). For example, if we had entity_0, entity_1, 
..., entity_9, and queried with limit = 5, I had thought that we would return 
entity_5 through entity_9. Now I realize we would return entity_0 through 
entity_4 (that also explains some of Rohith's early comments). Then the entity 
id order is really the earliest first. Is that what we intended? I know 
"reversing" an arbitrary string is not easy, but I want to make sure if we're 
on the same page and if there is a way to accomplish the most recent entity 
order.

bq. Secondly, what if users misses providing an prefixId in subsequent updates.?

I agree with Varun on this. Even without the prefix, clients can set any value 
for entities, and the storage will store them per schema. It would be the 
client's responsibility to ensure correct data gets in. Also note that we 
expect the AM to be the sole client for a given YARN app.

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: 0001-YARN-5585.patch, YARN-5585-workaround.patch, 
> YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities 
> then REST call gives first/last 100 entities. How to retrieve next set of 100 
> entities i.e 101 to 200 OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is 
> no way to achieve this. 
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> Since ATS is targeting large number of entities storage, it is very common 
> use case to get next set of entities using fromId rather than querying all 
> the entites. This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-10-06 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15552144#comment-15552144
 ] 

Varun Saxena commented on YARN-5585:


bq. I was thinking to use same REST API for both by using SingleColumnFilter. 
One cons I see is table scan for all the entityType i.e reflect in read 
performance.
We should not use SingleColumnValueFilter if we know the prefix because as you 
said former will lead to a relatively slower read performance. Basically we 
need to differentiate between having a prefix for the entity type and user 
unable to supply it.

bq. I would have thought that we store the entities in the reverse entity id 
order, but it appears that the entity id is encoded into the row key as is 
(EntityRowKey). Am I reading that right? If so, this is a bug to fix.
Entity IDs' can be anything. Even a completely alphabetical sequence can be an 
entity ID. So it will not be possible to define a reverse order for every 
generic entity ID. Is this your question ?

bq. Firstly about multi JVM which makes application programmer to define new 
protocol for transferring prefixId. 
Trying to understand this more. Can same DAG be executed by multiple Tez AMs' ?

bq. Secondly, what if users misses providing an prefixId in subsequent 
updates.? 
This should be caught during integration phase. Right ?


> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: 0001-YARN-5585.patch, YARN-5585-workaround.patch, 
> YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities 
> then REST call gives first/last 100 entities. How to retrieve next set of 100 
> entities i.e 101 to 200 OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is 
> no way to achieve this. 
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> Since ATS is targeting large number of entities storage, it is very common 
> use case to get next set of entities using fromId rather than querying all 
> the entites. This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-10-05 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15550930#comment-15550930
 ] 

Rohith Sharma K S commented on YARN-5585:
-

bq. We also need to be *crystal clear* that timeline clients *must* provide the 
same prefix for all subsequent updates of the same entity. I cannot stress that 
point enough. Rohith, could you confirm that it is not an issue with Tez to 
provide the created time for any subsequent updates for Tez entities?
This is very important point for TimelineClient users who wants to use 
prefixId. Even though I am in minority side of introducing *optional* prefixId, 
convinced myself to go ahead with it because of at least 
optionality(flexibility) is better than predefined storage specific sort order. 
 And knowing the issue is with storage layer which trying to solve popping the 
issue up to API by providing an optionality prefix, which exposing flaw in API 
so that user can mess up the storage which result in inconsistent data while 
retrieving. 
I had offline talk with one of the Tez developer, and he is fine to provide 
prefixId. Some concerns expressed by him are, Firstly about multi JVM which 
makes application programmer to define new protocol for transferring prefixId.  
Secondly, what if users misses providing an prefixId in subsequent updates.? 
This will makes storage mess up with data stored in 2 different entry or it can 
be multiple entry.

bq. I'm also realizing that we might have a bug in how we deal with entity 
id's. I would have thought that we store the entities in the reverse entity id 
order, but it appears that the entity id is encoded into the row key as is 
(EntityRowKey). Am I reading that right? If so, this is a bug to fix.
Sorry I could not get much. Could you explain bit elaborately. Do you mean 
reversing the only entityId i.e if entityId is "12345" then "54321" OR row-key 
itself?

bq. One other thing to deal with is the query by id. There, we need to be able 
to distinguish the case where the data do not have the prefix to begin with and 
that where data do. Ideally we would simply use the row key explicitly in the 
case of data that don't have the prefix to begin with. For those that do have 
the prefix, we cannot use the row key to fetch the row so we need to do 
something different. I don't think this was done in the current patch, but this 
is TBD.
I was thinking to use same REST API for both by using SingleColumnFilter. One 
cons I see is table scan for all the entityType i.e reflect in read performance.

Other comments, let me handle it. And also, I will create patch on YARN-5355 
branch.

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: 0001-YARN-5585.patch, YARN-5585-workaround.patch, 
> YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities 
> then REST call gives first/last 100 entities. How to retrieve next set of 100 
> entities i.e 101 to 200 OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is 
> no way to achieve this. 
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> Since ATS is targeting large number of entities storage, it is very common 
> use case to get next set of entities using fromId rather than querying all 
> the entites. This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-10-05 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15550218#comment-15550218
 ] 

Sangjin Lee commented on YARN-5585:
---

Thanks [~rohithsharma] for contributing the initial patch for this! I have some 
high level comments on this and a couple of specific ones.

(1)
I think the patch does this, but it would be great to leave this prefix 
generic. For Tez, I'm assuming the (inverted) created time would be it. For 
others, it might be something different (something that can be provided 
easily). I think it is useful to have that flexibility. More importantly, it 
should be *optional*. Any framework (and the YARN-generic ones) should be able 
to skip the prefix and expect things to be sorted by the entity id order. I 
think the patch reflects both, but wanted to clarify.

(2)
We also need to be *crystal clear* that timeline clients *must* provide the 
same prefix for all subsequent updates of the same entity. I cannot stress that 
point enough. Rohith, could you confirm that it is not an issue with Tez to 
provide the created time for any subsequent updates for Tez entities?

(3)
I'm also realizing that we might have a bug in how we deal with entity id's. I 
would have thought that we store the entities in the *reverse* entity id order, 
but it appears that the entity id is encoded into the row key as is 
({{EntityRowKey}}). Am I reading that right? If so, this is a bug to fix.

(4)
I agree with Varun that users should provide already inverted values. Users can 
call {{LongConverter.invertLong(createdTime)}} to give us inverted values. We 
also need to make this explicit in the javadoc.

(5)
I also agree with Varun that we need not store the prefix (again) as a column. 
It would be part of the row key, and as such we should have no problem reading 
it, right?

(6)
One other thing to deal with is the query by id. There, we need to be able to 
distinguish the case where the data do not have the prefix to begin with and 
that where data do. Ideally we would simply use the row key explicitly in the 
case of data that don't have the prefix to begin with. For those that do have 
the prefix, we cannot use the row key to fetch the row so we need to do 
something different. I don't think this was done in the current patch, but this 
is TBD.

(7)
Since this is a subtask for YARN-5355, can we base the patch on that feature 
branch? Thanks!

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: 0001-YARN-5585.patch, YARN-5585-workaround.patch, 
> YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities 
> then REST call gives first/last 100 entities. How to retrieve next set of 100 
> entities i.e 101 to 200 OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is 
> no way to achieve this. 
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> Since ATS is targeting large number of entities storage, it is very common 
> use case to get next set of entities using fromId rather than querying all 
> the entites. This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-10-05 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15547980#comment-15547980
 ] 

Varun Saxena commented on YARN-5585:


By the way thinking more over it, we cannot predict how Separator class would 
change in future. The code I suggested above is not really tied to anything. So 
we can either adopt the approach above and mandatorily add a test case to 
ensure stop row does not end with 0xFF. 

Or we can adopt what you have done as well i.e. copy relevant code from HBase 
but if we do it we should probably have this code in Utils class instead of 
GenericEntityReader.

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: 0001-YARN-5585.patch, YARN-5585-workaround.patch, 
> YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities 
> then REST call gives first/last 100 entities. How to retrieve next set of 100 
> entities i.e 101 to 200 OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is 
> no way to achieve this. 
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> Since ATS is targeting large number of entities storage, it is very common 
> use case to get next set of entities using fromId rather than querying all 
> the entites. This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-10-04 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15546650#comment-15546650
 ] 

Varun Saxena commented on YARN-5585:


bq. I purposefully used VARIABLE_SIZE because prefix can be empty bytes 
That's correct. Sorry, had missed it.

bq. Given your point-5 is valid, id_prefix is need to be stored in column and 
give it back to user while reading. Basically intention is user can provide 
fromEntityPrefix as filter.
When fromEntityPrefix is given as query param, we will construct the row key 
using it. We do not necessarily need a column. We can use Result#getRow() and 
using EntityRowKey#parseRowKey in parseEntity to fetch the prefix. Like below.
{code}
EntityRowKey rowKey = EntityRowKey.parseRowKey(result.getRow());
entity.setIdPrefix(rowKey.getEntityIdPrefix());
{code}

bq. After fetching 2 rows, user knows prefix is 2 , and gives fromEntityPrefix 
as 2 for retrieving next batch. Then reader need not to scan rows from 
beginning rather directly start scanning row-key prefixed with 2. And stop row 
need to be calculated for entityType level i.e till prefix 4.
Ok...Got it. 
But do we need to copy over code from HBase i.e. 
Scan#calculateTheClosestNextRowKeyForPrefix for it. What we can do is as under:
{code}
  // get the bytes for stop row
  entityRowKeyPrefix = new 
EntityRowKeyPrefix(context.getClusterId(),
  context.getUserId(), context.getFlowName(), 
context.getFlowRunId(),
  context.getAppId(), context.getEntityType());

  // set stop row
 byte[] stopRow = entityRowKeyPrefix.getRowKeyPrefix();
 stopRow[stopRow.length - 1] = 0xFF;
  scan.setStopRow(stopRow);
{code}
This is because getRowKeyPrefix will give a byte array ending with 
Separator#QUALIFIERS i.e. "!", which is equivalent to 0x21 in hex. QUALIFIERS 
will never be ending in a string equivalent of 0xFF so we can safely set last 
byte to 0xFF and set it as stop row.



> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: 0001-YARN-5585.patch, YARN-5585-workaround.patch, 
> YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities 
> then REST call gives first/last 100 entities. How to retrieve next set of 100 
> entities i.e 101 to 200 OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is 
> no way to achieve this. 
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> Since ATS is targeting large number of entities storage, it is very common 
> use case to get next set of entities using fromId rather than querying all 
> the entites. This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-10-04 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15546179#comment-15546179
 ] 

Rohith Sharma K S commented on YARN-5585:
-

Thanks Varun for quick review.. 

bq. Intention behind having ID_PREFIX in EntityColumn ? According to me, we 
need not store prefix in the column. Is it because we want to read it back and 
send it to client ?
Given your point-5 is valid, id_prefix is need to be stored in column and give 
it back to user while reading. Basically intention is user can provide 
fromEntityPrefix as filter. 


bq. No need of GenericEntityReader#calculateTheClosestNextRowKeyForPrefix. 
Scan#setRowPrefixFilter will do it for you. We should call it the same way as 
was done previously.
This is an optimization while scanning rows. This makes directly seeking to 
required row-key and start scanning. Say, the row-keys are stored in below 
order. Consider limit is 2 and prefix is unknown then scanning start from 
row-key beginning. After fetching 2 rows, user knows prefix is 2 , and gives 
fromEntityPrefix as 2 for retrieving next batch. Then reader need not to scan 
rows from beginning rather directly start scanning row-key prefixed with 2. And 
stop row need to be calculated for entityType level i.e till prefix 4.
{code}
cluster!user!flow!flowrun!app!entitytype!1!{entityid}
cluster!user!flow!flowrun!app!entitytype!2!{entityid}
cluster!user!flow!flowrun!app!entitytype!3!{entityid}
cluster!user!flow!flowrun!app!entitytype!4!{entityid}
{code}
bq. As entity ID prefix is a long, EntityRowKeyConverter#SEGMENT_SIZES should 
have new segment as Bytes.SIZEOF_LONG. It is currently given as VARIABLE_SIZE. 
Same change in TestRowKeys.
I purposefully used VARIABLE_SIZE because prefix can be empty bytes also when 
there is no prefix is specified. If we use Bytes.SIZEOF_LONG, then decoding 
always expect that there are some bytes for prefix, but ideally its not.  
Whenever prefix is not specified then do not want to use any default value 
which takes an extra byte for storage. 

bq. We will have to change Get to Scan with a SingleColumnValueFilter 
accordingly.
This is open point in attached patch, I will  look for feasibility to make use 
same  REST end point for prefix supported entities. 

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: 0001-YARN-5585.patch, YARN-5585-workaround.patch, 
> YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities 
> then REST call gives first/last 100 entities. How to retrieve next set of 100 
> entities i.e 101 to 200 OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is 
> no way to achieve this. 
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> Since ATS is targeting large number of entities storage, it is very common 
> use case to get next set of entities using fromId rather than querying all 
> the entites. This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-10-04 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15545835#comment-15545835
 ] 

Varun Saxena commented on YARN-5585:


Moreover, changes need to be made for GenericEntityReader#getResult as well. 
But that I assume will be done once we decide on REST APIs'. Because we need to 
handle 2 cases and hence have two different REST endpoints for the same. One 
where user queries an entity type which does not have a prefix and other where 
entity type is stored with a prefix but user may or may not supply the prefix. 
We will have to change Get to Scan with a SingleColumnValueFilter accordingly.

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: 0001-YARN-5585.patch, YARN-5585-workaround.patch, 
> YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities 
> then REST call gives first/last 100 entities. How to retrieve next set of 100 
> entities i.e 101 to 200 OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is 
> no way to achieve this. 
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> Since ATS is targeting large number of entities storage, it is very common 
> use case to get next set of entities using fromId rather than querying all 
> the entites. This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-10-04 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15545818#comment-15545818
 ] 

Varun Saxena commented on YARN-5585:


Thanks [~rohithsharma] for the patch. Few comments.

# Intention behind having ID_PREFIX in EntityColumn ? According to me, we need 
not store prefix in the column. Is it because we want to read it back and send 
it to client ?
# No need of GenericEntityReader#calculateTheClosestNextRowKeyForPrefix. 
Scan#setRowPrefixFilter will do it for you. We should call it the same way as 
was done previously.
# As entity ID prefix is a long, EntityRowKeyConverter#SEGMENT_SIZES should 
have new segment as Bytes.SIZEOF_LONG. Same change in TestRowKeys.
# In EntityRowKeyConverter#encode, no need to invert entity id prefix. We will 
take prefix as-is. Sender can publish the entity with inverted prefix if he 
wants contents in descending order (say). We can probably add something to 
TimelineUtils to invert it, if required, which then clients can use.
# In GenericEntityReader#parseEntity we should fetch id prefix from result set 
and setIdPrefix in TimelineEntity to be returned back. This will be useful for 
clients when they want to set fromPrefix (will be useful in Tez UI use case).
# Javadoc in TimelineReader should be changed. It currently says entities would 
be sorted by created time which is no longer true.
{code}
   * @return A set of TimelineEntity instances of the given entity
   *type in the given context scope which matches the given predicates
   *ordered by created time, descending. Each entity will only contain the
   *metadata(id, type and created time) plus the given fields to retrieve.
{code}
# We should also update documentation to reflect id prefix.




> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: 0001-YARN-5585.patch, YARN-5585-workaround.patch, 
> YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities 
> then REST call gives first/last 100 entities. How to retrieve next set of 100 
> entities i.e 101 to 200 OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is 
> no way to achieve this. 
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> Since ATS is targeting large number of entities storage, it is very common 
> use case to get next set of entities using fromId rather than querying all 
> the entites. This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-10-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15545552#comment-15545552
 ] 

Hadoop QA commented on YARN-5585:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
0s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 27s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
38s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 49s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
26s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
34s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s 
{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
40s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 20s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 20s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 37s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn: The patch generated 5 
new + 24 unchanged - 0 fixed = 29 total (was 24) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 47s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
22s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
44s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 14s 
{color} | {color:red} hadoop-yarn-server-timelineservice in the patch failed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 23s 
{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 44s 
{color} | {color:green} hadoop-yarn-server-timelineservice in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
17s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 23m 15s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12831536/0001-YARN-5585.patch |
| JIRA Issue | YARN-5585 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 73ab6fc84a67 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 
17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / ef7f06f |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/13276/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt
 |
| javadoc | 
https://builds.apache.org/job/PreCommit-YARN-Build/13276/artifact/patchprocess/patch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-timelineservice.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/13276/testReport/ |
| modules | C: 

[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-29 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15533830#comment-15533830
 ] 

Vrushali C commented on YARN-5585:
--

Thanks [~rohithsharma] for the summary.

bq. 2. By default, use createdTime as entityPrefixId.
Also, that means, frameworks which don't want to use the entity id prefix have 
to explicitly specify a null prefix (or a special value that means null).
All the same, it will be really good to mention in the docs for clients that 
they should do the following. 

{code:title=TimelineWriterClient.java}
entity.setEntityPrefix(createdTime);
client.writeEntity(entity); // pseudo-code
{code}

bq. For the REST end point, we can support fromEntityPrefixId will become 
combination of entityPrefixId+entityId which can be used for pagination
I think pagination handling should be more generic than depending on something 
like "fromEntityPrefixId".  REST queries should simply ask for top N records 
with the understanding that the records are returned in sorted order of entity 
prefixes. For the next page of results, the client sends back the last row 
returned's key/entity prefix. For a rest query, if the "startFrom" query param 
is present, the scan starts from "startFrom" prefix value and returns the next 
N such records.




> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: YARN-5585-workaround.patch, YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities 
> then REST call gives first/last 100 entities. How to retrieve next set of 100 
> entities i.e 101 to 200 OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is 
> no way to achieve this. 
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> Since ATS is targeting large number of entities storage, it is very common 
> use case to get next set of entities using fromId rather than querying all 
> the entites. This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-29 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15533740#comment-15533740
 ] 

Rohith Sharma K S commented on YARN-5585:
-

In weekly sync up meeting, had discussed more on proposed solution. And the 
consensus are 
# Proposal of introducing entityPrefixId remain as it is.
# By default, use createdTime as entityPrefixId.
# For the REST end point, we can support fromEntityPrefixId will become 
combination of entityPrefixId+entityId which can be used for pagination. For 
the first time, user need not worry about entityPrefixId, so he can get list of 
entities. For the second page onwards, use the last entityPrefixId of previous 
out put for retrieving next set of entities.
# The single entity retrieval would become issue if entityPrefixid is not 
known. So, it is required to use SingleColumnValueFilter for reading single 
entities.

[~vrushalic] [~varun_saxena] [~gtCarrera9]  and [~vinodkv] Please feel free to 
add/correct from above points.

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: YARN-5585-workaround.patch, YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities 
> then REST call gives first/last 100 entities. How to retrieve next set of 100 
> entities i.e 101 to 200 OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is 
> no way to achieve this. 
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> Since ATS is targeting large number of entities storage, it is very common 
> use case to get next set of entities using fromId rather than querying all 
> the entites. This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-29 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15533365#comment-15533365
 ] 

Vrushali C commented on YARN-5585:
--

I believe one way or the other client does need to send in an ordering value, 
whether it is created time or something else. This helps frameworks to develop 
their UI specific queries with much more flexibility and makes the timeline 
service more generic towards all frameworks.

Keeping the writes lean is a good goal to have but not at the cost of incurring 
an extra heavy read penalty for the UI user. If we can easily avoid read 
penalties by temporary write amplifications, that would be much more user 
friendly than having the client wait several extra moments to retrieve data. 
Once the UI becomes slow to respond, it becomes harder to use and, to me, that 
ought to be a more important key focus than avoiding temporary write 
amplifications.

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: YARN-5585-workaround.patch, YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities 
> then REST call gives first/last 100 entities. How to retrieve next set of 100 
> entities i.e 101 to 200 OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is 
> no way to achieve this. 
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> Since ATS is targeting large number of entities storage, it is very common 
> use case to get next set of entities using fromId rather than querying all 
> the entites. This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-29 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15532764#comment-15532764
 ] 

Sangjin Lee commented on YARN-5585:
---

Thanks [~rohithsharma] for your comments and input!

I'd like to structure the proposal in a way that hopefully answers some of your 
questions and moves this forward.

To me one of the key goals here is to keep writes lean. In other words, we 
would like to avoid write amplifications (no more auxiliary tables or double 
writes). Then it follows that the client would need to provide this entity 
prefix not only when the entity is written for the first time but also *on all 
subsequent updates*.

Providing this entity prefix on all writes and updates may not be practical or 
desired for all cases. I can certainly see that this is not practical for 
YARN-generic entities (e.g. containers). So IMO the *optionality* is a must 
here. If you don't want to have a different sort order than the entity id 
order, you shouldn't be forced to do it.

In terms of what the entity prefix should be if you need it, a strong argument 
can be made for using created time for everyone. However, again, providing the 
created timestamp for all subsequent writes may not be practical. That would 
mean that the AM would need to keep track of the created time for all their 
entities at all times. Perhaps that is trivial for certain AMs, and not for 
others. It's all the more reason to come up with a simple prefix scheme that 
can be easily provided in many situations. For example, if there is a number 
that can be easily computed for your entity, that would be a perfect candidate 
for the entity prefix.

For Tez, if we introduce the entity prefix and you use the created time for 
this, either way it would look exactly the same from the tez perspective. 
Whether we have a more flexible entity prefix or explicit created time (both 
would be in the row key), it would work the same. The client code would do 
either
{code}
entity.setEntityPrefix(createdTime);
client.writeEntity(entity); // pseudo-code
{code}
or
{code}
entity.setCreatedTime(createdTime);
client.writeEntity(entity); // pseudo-code
{code}
The rest of the server code or how data is written, fetched and sorted would 
work in the same manner.

Unfortunately I won't be able to attend today's call as I am away on a 
conference. Hopefully this would help the discussion move forward.

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: YARN-5585-workaround.patch, YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities 
> then REST call gives first/last 100 entities. How to retrieve next set of 100 
> entities i.e 101 to 200 OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is 
> no way to achieve this. 
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> Since ATS is targeting large number of entities storage, it is very common 
> use case to get next set of entities using fromId rather than querying all 
> the entites. This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-29 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15532592#comment-15532592
 ] 

Varun Saxena commented on YARN-5585:


bq. In a distributed cluster, we can expect source of origin of same entity 
types from different JVM. For example in MR, what if YarnChild's want to 
publish its entities with taskId? How can each yarn child knows about 
entityPrefixId? Only uniqueness in cluster will be timestamp.
Frankly, by design, application level entities will be published by AM. Only it 
has access to the collector address and in a secure setup will have access to 
token to publish to collectors. We do not forward this info to containers. AM 
can however forward this information to other processes which can then 
potentially publish entities but if specific AMs' can do that, they can easily 
push the prefix as well. However, task level or its child entities will be 
different and will frankly have their own unique prefix.

bq. If entityPrefixId is string
We were thinking of it to be a long. Intention of prefix is to help get a sort 
order. Numbers can easily achieve that. Haven't reached a conclusion on this 
though. Needs to be further discussed.

bq. If we look at the problem , this issue is from storage layer. 
Frankly we cannot necessarily say ordering is a storage issue as no storage 
would naturally provide a created time sort ordering. Even insertion order is 
not necessary. We had to do some plumbing up even for Level DB and this would 
be even more difficult for HDFS storage. Even for timeline service as a whole 
(irrespective of storage), technically it should be fine if it provides you a 
way to retrieve the entities which you want. 
I understand though entity retrieval by created time sort order, is the most 
common use case. That is why even I was initially of the opinion that we should 
have inherent support for created time ordering. We can go with an index table 
for created time as suggested earlier. But this would incur read side penalty. 
Or we can have created time as part of entity table row key but this would mean 
write side penalty too because you would not know what was the created time of 
the entity supplied. We can however force user to send created time in every 
entity. 
As you were not there in last meeting, your point of view was missing. We can 
revisit this again in today's meeting.
The only way this can be solved at timeline service layer without invoking API 
change is to have another table to assist in retrieval. But this would then 
incur read/write penalties. Can we do something in coprocessor i.e. do 
something in prePut or preScan to support created time use case ? Well I am not 
really aware of the cost incurred due to this so will have to discuss.

bq. In future, if any other storage is plugged entity prefix would become stale.
Maybe or maybe not. They can potentially use it for indexing as well.

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: YARN-5585-workaround.patch, YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities 
> then REST call gives first/last 100 entities. How to retrieve next set of 100 
> entities i.e 101 to 200 OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is 
> no way to achieve this. 
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> Since ATS is targeting large number of entities storage, it is very common 
> use case to get next set of entities using fromId rather than querying all 
> the entites. This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-29 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15532511#comment-15532511
 ] 

Rohith Sharma K S commented on YARN-5585:
-

I do still think of concern for introducing new field entityPrefixId rather 
using createdTime in row key.  
# In a distributed cluster,  we can expect source of origin of same entity 
types from different JVM. For example in MR, what if YarnChild's want to 
publish its entities with taskId? How can each yarn child knows about 
entityPrefixId? Only uniqueness in cluster will be timestamp.
# If entityPrefixId is string, then expecting user to provide it with padded 0 
values like 1, 2 etc. It will be a very tedious task for user to decide 
what is the length of padding zeros should be used. In long running service, 
never able to predict how many number of entities can generated.

If we look at the problem , this issue is from storage layer. To solve this, I 
do not feel we need to take solution to the API layer i.e changing 
TimelineEntity object. To unblock this issue in API layer, it would be better 
to go with API change i.e current patch attached.? And storage layer issue 
could be discussed more in other JIRA.

In future, if any other storage is plugged entity prefix would become stale.

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: YARN-5585-workaround.patch, YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities 
> then REST call gives first/last 100 entities. How to retrieve next set of 100 
> entities i.e 101 to 200 OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is 
> no way to achieve this. 
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> Since ATS is targeting large number of entities storage, it is very common 
> use case to get next set of entities using fromId rather than querying all 
> the entites. This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-26 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15524011#comment-15524011
 ] 

Varun Saxena commented on YARN-5585:


Just to add,

bq. And after providing entityPrefixId , does reader server sort the entities 
OR client need to sort it?
Reader will just return entities in lexicographic order. Prefix should be such 
that it ensures sort order i.e. prefix should be such that lexicographic order 
of entities is in the sort order you want. For instance, you can send prefix as 
reverse of created time to ensure entities are returned in descending order of 
created time.

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: YARN-5585-workaround.patch, YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities 
> then REST call gives first/last 100 entities. How to retrieve next set of 100 
> entities i.e 101 to 200 OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is 
> no way to achieve this. 
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> Since ATS is targeting large number of entities storage, it is very common 
> use case to get next set of entities using fromId rather than querying all 
> the entites. This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-26 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15523904#comment-15523904
 ] 

Li Lu commented on YARN-5585:
-

Hi [~rohithsharma], some of my understandings
bq. IIUC from user perspective, while publishing entity, user need to provide 
entityPrefixId in TimelineEntity object. If plan is to add new entityPrefixId 
in as row key, and expecting to provide while publishing entity then why cant 
createdTime itself used as row key? If createdTime is not provided, however 
default sorting i.e lexicographical order will be returned.
This is doable, but providing a prefix field allows users to have more 
flexibility to customize sorting orders. Users need only a single statement to 
put created time or reverted created time into the prefix. 

bq. Does there will be an default comparator for entityPrefixId in 
TimelineEntity object?
We were discussing to completely rework on the compare methods of timeline 
entities to provide a natural order that is consistent with our discussion 
here. Thoughts? 

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: YARN-5585-workaround.patch, YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities 
> then REST call gives first/last 100 entities. How to retrieve next set of 100 
> entities i.e 101 to 200 OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is 
> no way to achieve this. 
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> Since ATS is targeting large number of entities storage, it is very common 
> use case to get next set of entities using fromId rather than querying all 
> the entites. This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-26 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15523429#comment-15523429
 ] 

Rohith Sharma K S commented on YARN-5585:
-

Thanks Varun, I got overall picture of solution.  
# IIUC from user perspective, while publishing entity, user need to provide 
entityPrefixId in TimelineEntity object. If plan is to add new entityPrefixId 
in as row key, and expecting to provide while publishing entity then *why cant 
createdTime itself used as row key?* If createdTime is not provided, however 
default sorting i.e lexicographical order will be returned. 
# And after providing entityPrefixId , does reader server sort the entities OR 
client need to sort it? 
# Does there will be an default comparator for entityPrefixId in TimelineEntity 
object? 

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: YARN-5585-workaround.patch, YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities 
> then REST call gives first/last 100 entities. How to retrieve next set of 100 
> entities i.e 101 to 200 OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is 
> no way to achieve this. 
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> Since ATS is targeting large number of entities storage, it is very common 
> use case to get next set of entities using fromId rather than querying all 
> the entites. This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15523186#comment-15523186
 ] 

Hadoop QA commented on YARN-5585:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
51s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 19s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
29s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
17s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 16s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 16s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 10s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice:
 The patch generated 6 new + 19 unchanged - 3 fixed = 25 total (was 22) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
34s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 14s 
{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-timelineservice
 generated 6 new + 0 unchanged - 0 fixed = 6 total (was 0) {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 44s 
{color} | {color:green} hadoop-yarn-server-timelineservice in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
15s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 12m 43s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12830304/YARN-5585-workaround.patch
 |
| JIRA Issue | YARN-5585 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux d8b5dd2ff4d2 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 
21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 14a696f |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/13216/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-timelineservice.txt
 |
| javadoc | 
https://builds.apache.org/job/PreCommit-YARN-Build/13216/artifact/patchprocess/diff-javadoc-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-timelineservice.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/13216/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice
 U: 

[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-26 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15523179#comment-15523179
 ] 

Varun Saxena commented on YARN-5585:


Alternatively, you can change your entity ID in a manner in which lexicographic 
sort order maps to creation time order, by adding padding etc. And reconverting 
it back on read. This however did not seem feasible.

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: YARN-5585-workaround.patch, YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities 
> then REST call gives first/last 100 entities. How to retrieve next set of 100 
> entities i.e 101 to 200 OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is 
> no way to achieve this. 
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> Since ATS is targeting large number of entities storage, it is very common 
> use case to get next set of entities using fromId rather than querying all 
> the entites. This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-26 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15523177#comment-15523177
 ] 

Varun Saxena commented on YARN-5585:


bq. When scan is performed on rows, the ResultScanner gives in the order of 
lexicographical order only. I could not get where does this entityIdPrefix will 
be used? Is it from storage or readerservere?
Entity ID prefix will be supplied by Tez in your case and can be inverse of 
created time if you want rows to be sorted in a descending order by created 
time. TimelineEntity class will now carry a prefix too.

bq. Does new tables separate or same-existing?
Same tables. Just the row key changes if you are not happy with lexicographic 
order.





> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: YARN-5585-workaround.patch, YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities 
> then REST call gives first/last 100 entities. How to retrieve next set of 100 
> entities i.e 101 to 200 OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is 
> no way to achieve this. 
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> Since ATS is targeting large number of entities storage, it is very common 
> use case to get next set of entities using fromId rather than querying all 
> the entites. This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-26 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15523047#comment-15523047
 ] 

Rohith Sharma K S commented on YARN-5585:
-

Apologized,  I was in leave!! Lot more discussions are happened and trying to 
understand each comments. One thing I would like to say is, lets design the 
system in general. Later can be asked to tez or any other users to adopt to 
designed system with minimal changes. Tez publishes entity id without padding, 
so lexicographical order goes toss for them!!.

CMIIW, 
# If {entityidprefix} is part of row-key then row-keys are sorted in 
lexicographical order. When scan is performed on rows, the ResultScanner gives 
in the order of lexicographical order only. I could not get where does this 
entityIdPrefix will be used? Is it from storage or readerservere?
# Limits on scanned rows can not be applied from HBase. So limit has to be 
taken care by readerServer. I just attached work-around patch which I am using 
for testing purpose. 
# bq. We can have a separate REST endpoint to distinguish between prefix based 
queries and non prefix based queries
Does new tables separate or same-existing? 

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: YARN-5585-workaround.patch, YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities 
> then REST call gives first/last 100 entities. How to retrieve next set of 100 
> entities i.e 101 to 200 OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is 
> no way to achieve this. 
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> Since ATS is targeting large number of entities storage, it is very common 
> use case to get next set of entities using fromId rather than querying all 
> the entites. This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-25 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15521715#comment-15521715
 ] 

Joep Rottinghuis commented on YARN-5585:


Good point, something we need to check. We will want to really avoid a 
read-write, that will wreak havoc on the scenarios where HBase is offline and 
will introduce race conditions.

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities 
> then REST call gives first/last 100 entities. How to retrieve next set of 100 
> entities i.e 101 to 200 OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is 
> no way to achieve this. 
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> Since ATS is targeting large number of entities storage, it is very common 
> use case to get next set of entities using fromId rather than querying all 
> the entites. This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-25 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15521189#comment-15521189
 ] 

Sangjin Lee commented on YARN-5585:
---

I wasn't sure whether it was possible to update an existing row without 
providing the full row key and without reading first. If an update works with a 
column value filter, that's great. How would the performance be? It would be 
great if there is not much performance penalty in this case.

That said, it would be ideal if the framework can provide the same entity 
prefix on subsequent updates, which shouldn't be too hard to do.

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities 
> then REST call gives first/last 100 entities. How to retrieve next set of 100 
> entities i.e 101 to 200 OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is 
> no way to achieve this. 
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> Since ATS is targeting large number of entities storage, it is very common 
> use case to get next set of entities using fromId rather than querying all 
> the entites. This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-25 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15520959#comment-15520959
 ] 

Joep Rottinghuis commented on YARN-5585:


Will have to look into the solution a bit deeper, but we need to be careful not 
to up-end the entire structure of GenericEntityReader and 
constructFilterListBasedOnFilters method hierarchy. TimelineEntityReader 
doesn't have an obvious place to inject the additional filter...

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities 
> then REST call gives first/last 100 entities. How to retrieve next set of 100 
> entities i.e 101 to 200 OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is 
> no way to achieve this. 
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> Since ATS is targeting large number of entities storage, it is very common 
> use case to get next set of entities using fromId rather than querying all 
> the entites. This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-25 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15520952#comment-15520952
 ] 

Joep Rottinghuis commented on YARN-5585:


[~sjlee0] wouldn't the column value filter on entity-id uniquely identify the 
row to update?

I think the additional challenge is that our setup allows for an easy rowkey - 
column name retrieval, but adding  column value filter will mean we'll have to 
add that to the provided filters on the fly. Users can already provide filters.
In this case we'll have to create an _and_ filter with the column value filter 
for entity ID for us and the user-provided on the other side. I think the 
plumbing already allows for a FilterList, so we'll have to add this one, 
preferably in the beginning of the list (no need to evaluate all other possibly 
complex filters when the entity ID doesn't match).

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities 
> then REST call gives first/last 100 entities. How to retrieve next set of 100 
> entities i.e 101 to 200 OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is 
> no way to achieve this. 
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> Since ATS is targeting large number of entities storage, it is very common 
> use case to get next set of entities using fromId rather than querying all 
> the entites. This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-23 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517864#comment-15517864
 ] 

Sangjin Lee commented on YARN-5585:
---

To be clear, in case of things like tez vertex id, it should be easy to form a 
number that is in the right order instead of the created time, using things 
like its sequence id. The issue was that the whole id was being treated as a 
string which may not reflect the right numeric order.

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities 
> then REST call gives first/last 100 entities. How to retrieve next set of 100 
> entities i.e 101 to 200 OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is 
> no way to achieve this. 
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> Since ATS is targeting large number of entities storage, it is very common 
> use case to get next set of entities using fromId rather than querying all 
> the entites. This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-23 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517489#comment-15517489
 ] 

Sangjin Lee commented on YARN-5585:
---

I do have one question for us to think about. With the entity id prefix, it 
becomes a requirement that the framework provide the entity id prefix on all 
writes, including *updates* to existing entities. This might become an 
interesting challenge.

For example, if tez wanted to use the created time of a vertex as its entity id 
prefix (for sorting), then even for subsequent updates of a vertex entity, tez 
would need to pass the created time, or it would be difficult to write.

[~vrushalic], [~jrottinghuis], thoughts?

[~rohithsharma], is it feasible for tez to provide created time every time a 
tez entity is created or *updated*?

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities 
> then REST call gives first/last 100 entities. How to retrieve next set of 100 
> entities i.e 101 to 200 OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is 
> no way to achieve this. 
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> Since ATS is targeting large number of entities storage, it is very common 
> use case to get next set of entities using fromId rather than querying all 
> the entites. This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-23 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517482#comment-15517482
 ] 

Sangjin Lee commented on YARN-5585:
---

Thanks [~varun_saxena] for summarizing the discussion clearly. The description 
is accurate.

The entity id prefix is optional. If you're happy with the entity id order, 
there is nothing for the framework to do. If you want a different sort order 
than the entity id order, the framework should provide the prefix values on 
write.

Another point about this alternate sort order is basically fixed per framework. 
In other words, you don't change this once a framework adopted a certain 
natural sort order. If you want to resort dynamically, then we're really 
talking about reading all entities and sorting on the client side (i.e. 
browser). That is essentially the current YARN web UI behavior.

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities 
> then REST call gives first/last 100 entities. How to retrieve next set of 100 
> entities i.e 101 to 200 OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is 
> no way to achieve this. 
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> Since ATS is targeting large number of entities storage, it is very common 
> use case to get next set of entities using fromId rather than querying all 
> the entites. This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-22 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514243#comment-15514243
 ] 

Varun Saxena commented on YARN-5585:


Summarizing the solution we decided upon in the call.

* We will now return entities from entity table in a lexicographic order of 
entity IDs'
* To achieve a different sort order, we will provide a mechanism for 
applications to provide an entity ID prefix which can be set in the 
TimelineEntity object while writing the entity to backend.
* This entityId prefix will be part of the row key in entity table. As the name 
suggests, it will be present just before the entity ID. Applications can choose 
to provide no entity ID prefix if they are happy with the lexicographic sort 
order. So the row key now will be 
{{cluster!user!flow!flowrun!app!entitytype!\{entityidprefix\}!\{entityid\}}}
* Entity ID will also be stored under a column qualifier too (being done 
already).
* Entity ID prefix can be a number (say long) as numbers generally provide a 
natural sort ordering. However, this needs to be finalized. Keep it as a string 
?
* When querying multiple entities, we will return the top N entities decided by 
limit in a lexicographic order of entity ID prefix + entity ID (i.e. if entity 
ID prefix is supplied). fromID filter can now be something like fromIDPrefix 
(say) or a similar filter which provides prefix + ID to support pagination.
* While querying a single entity, prefix can be supplied as a query param. If 
supplied, it will be a Get, otherwise we need to have a Scan with 
SingleColumnValueFilter on entity ID (this will be comparatively slower). We 
can have a separate REST endpoint to distinguish between prefix based queries 
and non prefix based queries. We need to distinguish between the case where for 
an entity prefix has not been specified on the write path and prefix not just 
supplied at the read path (even if it was supplied at the write path). This 
needs to be finalized.
* Prefix will also be returned as part of TimelineEntity object in response.

cc [~jrottinghuis], [~sjlee0], [~vrushalic], [~gtCarrera9]. Hope this covers 
everything.

The reason this solution was chosen was that we thought in UI use cases a 
single entity read would typically be followed listing of multiple entities and 
hence prefix would be known. This does not mean however, that we will not 
provide a mechanism to fetch entity if prefix wasn't given. We can use a single 
column value filter then.
Moreover, this solution overall had lesser write or read penalty compared to 
solutions listed above.


> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities 
> then REST call gives first/last 100 entities. How to retrieve next set of 100 
> entities i.e 101 to 200 OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is 
> no way to achieve this. 
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> Since ATS is targeting large number of entities storage, it is very common 
> use case to get next set of entities using fromId rather than querying all 
> the entites. This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-22 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513781#comment-15513781
 ] 

Sangjin Lee commented on YARN-5585:
---

Thanks for your comments [~varun_saxena]. Yes, we should discuss this during 
the call and report back here.

Before we go into how to implement, I think we need to have a consensus on the 
requirements first. Querying for entities is a fairly generic thing, and IMO 
there should be a clear expectation of in what order they should be queried. It 
affects *which* entities get selected as well as in what order they are sorted. 
As I mentioned, I don't think it would be desirable to leave this order 
completely arbitrary, or things could get quite confusing really quickly.

My preference for this sorting order is either the entity id (descending) order 
or the chronological order. I think the entity id order is the simplest and 
easiest to understand, and for the most part identical to the chronological 
order. YARN entities are mostly compliant (so are MR entities), and it would 
not be unreasonable to ask frameworks to maintain entity id's that way. Even if 
that is not feasible, there would be a very consistent understanding how 
entities would be returned to the reader. That's the default sorting order in 
the current YARN RM web UI too. Can tez adopt a stricter entity id scheme? If 
not, at least would it be acceptable if entities are consistently returned in 
that order?

If we go with the chronological order (created time), then I would want it to 
be consistent. Then we should do it not only for framework entities but also 
YARN entities and change the row key schema for all. And I think that may 
require the secondary lookup table (yes, I understand this would be only for 
lookups and not for data).

Another point about sorting within the timeline reader code. If the query is 
specified with a limit, the limit is passed to the hbase client, and as such it 
will only return that number of entities (or fewer), right? I don't think hbase 
will return more than the specified limit, no? Then I don't understand how you 
would get a *different* set of tez entities than what you expected. For 
example, if there are entity 1 through 10, and your limit was 5, I would expect 
hbase to return 6 through 10 still. The reader code may rearrange them so that 
6 is at the top, but I don't expect hbase to return anything other than 6 
through 10. [~rohithsharma], could you confirm? Did I understand this right?

Also, apart from fixing the sorting in {{TimelineEntity.compareTo()}}, I am not 
sure if we need to re-sort the entities that are returned by hbase again in the 
timeline reader code. The result set from hbase should return them in the right 
order, right? Then I think we should simply return them in the same order 
without applying any further sorting. In other words, instead of using a sorted 
set, we should use the insertion-order set. Thoughts? [~varun_saxena]



> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities 
> then REST call gives first/last 100 entities. How to retrieve next set of 100 
> entities i.e 101 to 200 OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is 
> no way to achieve this. 
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> Since ATS is targeting large number of entities storage, it is very common 
> use case to get next set of entities using fromId rather than querying all 
> the entites. This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-22 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15512673#comment-15512673
 ] 

Varun Saxena commented on YARN-5585:


Just to clarify, what I meant by having another index table was not to store 
data in it. It only stores the entityID for 
cluster!user!flow!run!app!entitytype and inverted created time.
The write to this table will only be when created time is reported i.e. when 
application reports created time on start event (most probably).

As as part of the interface, we are claiming entities will be returned, 
descendingly sorted by created time, I felt this use case we should definitely 
support. 
Whether we support sorting by some other parameter or not.
Currently we iterate over all the entities within the scope of entity type to 
arrive at the sorted set of entities. So, this IMO should definitely be fixed 
by providing some sort of index table.

In the 2nd point in my  [comment above | 
https://issues.apache.org/jira/browse/YARN-5585?focusedCommentId=15494251=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15494251]
 we can query entity ID specific entities directly from entity table.
One more suggestion was to open up an interface which can be used to provide 
encoding and decoding for specific Entity IDs' (based on entity type) as part 
of row key.
This would not require any extra write or read. However, Li and Rohith seemed 
to be a little reluctant with that solution as Tez or Spark will have to add 
code for it, albeit only a little bit.

However, as [~vrushalic] suggested we can also create an auxiliary table, and 
specify the key in timeline entity. Issue with this is we are sort of exposing 
internal implementation.
This however can be useful if we want to sort by something else as well as 
pointed out, not merely created time. Problem though can be double write. How 
about having this auxiliary table as an index table ? And have one write just 
to make an entry into this table. 
On read side though we can refer to this index table depending on the 
suggestion made by Vrushali i.e. specify the index table and start row key and 
then use MultiRowRangeFilter to get records from entity table.
Thoughts ?

However, I do feel we inherently need to support created time based sorting 
scenario (i.e. have created time based index table as a mandatory table without 
user needing to specify it in REST) as we promise in the interface that 
entities will be sorted in that fashion.

Probably we can discuss further on this in call today

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities 
> then REST call gives first/last 100 entities. How to retrieve next set of 100 
> entities i.e 101 to 200 OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is 
> no way to achieve this. 
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> Since ATS is targeting large number of entities storage, it is very common 
> use case to get next set of entities using fromId rather than querying all 
> the entites. This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-21 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15512200#comment-15512200
 ] 

Sangjin Lee commented on YARN-5585:
---

I am also catching up on this discussion (sorry it got delayed).

Generally I am in agreement with Varun and Vrushali on possible approaches. I'd 
like to add a few more thoughts to refine the idea.

(1) supporting chronological order sorting
I think that even for framework-specific entities (e.g. tez vertices, MR task 
entities, etc.), the "sorting" order cannot be completely arbitrary. Because we 
have a strong design decision on reflecting recency in the row keys, the 
natural sorting order should be the *chronological order*, or strange things 
would result.

For YARN entities, the id order would satisfy this for the most part (and ditto 
for MR entities). If tez can craft the id's such that the lexicographical order 
is also the chronological order, that would be by far the simplest solution to 
the problem. I'm not sure how feasible it is for tez to add padding etc. to 
preserve the chronological order in the entity id's. [~rohithsharma], can we 
change the id's to order them properly?

If the framework cannot make the id lexicographical order the same as the 
chronological order, then we might have to introduce the notion of bytes 
provided by the framework (and an auxiliary table) to support this as suggested 
by Vrushali and Varun. But that would be at the some cost. All things being 
equal, I would love not to populate another table on the write path.

Also note that we still need to be able to support single-entity queries in 
this case (i.e. queries by entity id). How would we able to support queries by 
id in this case?

(2) setting the created time field
In timeline service v.2, the strong assumption/requirement is that the created 
time is set by the client. It sounds like the current tez code does not set the 
created time. I think it should be set. That's the contract we're using. We're 
not really expecting an empty created time when we write them.

(3) TimelineEntity.compareTo()
It is a good catch by Rohith. It escaped the review, but it does appear that 
the id sorting if created time is empty is the opposite of what it should be. 
The string should be sorted by the descending order, but the current code is 
doing the opposite. This should be fixed. We can either fix it here or can open 
a separate subtask to fix it. Either way, we should fix it.

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities 
> then REST call gives first/last 100 entities. How to retrieve next set of 100 
> entities i.e 101 to 200 OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is 
> no way to achieve this. 
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> Since ATS is targeting large number of entities storage, it is very common 
> use case to get next set of entities using fromId rather than querying all 
> the entites. This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-21 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15512079#comment-15512079
 ] 

Vrushali C commented on YARN-5585:
--

I have been thinking more on this. I think if there is a concern about having 
the same entity data in two tables, what we could do is, set a TTL (time to 
live) on the cells in the auxiliary table. That way, for some period of time 
time we store data in two places but then it gets cleaned up. 

For example, if Tez UI queries for data in the auxiliary table for a job that 
ran 1 year back, then say, it does not exist anymore in the auxiliary table 
since it got cleaned up by hbase. Now the Tez UI can try querying the regular 
table. Or the auxiliary REST api call can take a parameter that says if data is 
not found in auxiliary table, please query the regular entity table and the 
rest call would perhaps then take a little longer to return. Since we are 
querying for something that ran 1 year back, I believe we can wait for an extra 
moment for the call to return.

This way, we store data in two tables for a brief time period, rely on hbase to 
clean up cells as per their TTL and provide a way for frameworks to store/query 
their data in harmony with timeline service storage.

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities 
> then REST call gives first/last 100 entities. How to retrieve next set of 100 
> entities i.e 101 to 200 OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is 
> no way to achieve this. 
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> Since ATS is targeting large number of entities storage, it is very common 
> use case to get next set of entities using fromId rather than querying all 
> the entites. This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-21 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15511129#comment-15511129
 ] 

Vrushali C commented on YARN-5585:
--

Catching up on this thread. I have tried to read through all the comments and 
discussions on this jira but please correct me if I am mistaken.

Two objectives here:
1) We are looking for a way to paginate results/response. 
2) Ability to return sorted results in the rest response (sorted on something 
other than row key)

Thoughts on these:
- We are looking for a way to paginate results/response. 
This pagination requirement is independent of any particular framework like 
Tez. With hRaven, our experience has been that more often than not, we end up 
enabling pagination support for most APIs.  So, in general, our rest api calls 
should support pagination. 

Pagination via the REST query:
This involves, in a generic fashion, being able to send in a “startFromRowKey” 
in the rest query. Say we extend our rest apis to accept such a parameter, it 
becomes generic enough to fetch N rows after this particular “startFromRowKey” 
value. The first rest api call will not send in anything, but each rest 
response will return “lastRowKey” to the client so that the client can use this 
in the next rest call. I have found this to be also useful for debugging the 
rest output on the browser.

- For Tez in particular, we need the ability to return sorted results in the 
rest response. In this case, results sorted based on “creation_time”.  The 
currently existing row key in the entity table does not all for sorted order of 
creation time retrieval very easily. 

So here is proposal which incorporates some aspects of both of your proposals 
Varun. 

I think we should expose a way for frameworks like Tez to store data sorted as 
per their criteria. And also allow them to specify when they want to query this 
specially sorted data. 

Today, Tez wants it sorted in entity creation time. Tomorrow, that could 
change. Also, today some other framework like Spark might want entities sorted 
based on something else. So putting it in the entity table's row key becomes a 
tough decision.

I propose we allow for auxiliary tables to be created for entities via cluster 
configuration settings. The auxiliary table name etc will be set in config in 
just like the timeline entity table name is set. This auxiliary table is 
specifically for entities, so has the same structure. 

Now, when tez’s timeline client creates a timeline entity, it will create it as 
it does right now but in addition, it will populate two new members of 
TimelineEntity object:
- auxiliaryTableName which contains the desired table name
- auxillaryEncodedKey   which contains a byte array value of 
“Inv(creation_time)!entity_id”. This is to be used as part of the row key 
suffix in the auxiliary table. Timeline service does not know what this byte 
value is, it does not care. It only adds this after the regular row key prefix 
of “user!cluster!flow!Inv(flow run id)! 
application!entitytype!”

Now it sends this write to timeline service. At the hbase writer side, we 
notice that the auxiliary table and auxiliary key are populated in the timeline 
entity object, so we do two writes. One write goes to our regular entity table 
with existing row key structure and other write goes to the auxiliary table 
with the row key of “user!cluster!flow!Inv(flow run id)! 
application!entitytype!”

On the reader side, we allow the rest api to now specify explicitly if the 
client want reads from the auxillary table. Else reads go to the regular entity 
table. For frameworks like Tez, whenever they need sorted data based on 
creation time, perhaps in their UI, they know that, so they can now specify as 
part of the query param in their rest query that this is for the auxiliary 
table.  

This way, we provide frameworks a way to store data in whichever sorted order 
they want and for them to determine queries need that sorted data. 




> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities 
> then REST call gives first/last 100 entities. How to retrieve next set of 100 
> entities i.e 101 to 200 OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 

[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-21 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510573#comment-15510573
 ] 

Varun Saxena commented on YARN-5585:


I think HBase table schema change as per suggestion given by me above can be 
done in another JIRA and fromId filter added in this JIRA.
Is that fine with you [~rohithsharma] ? I would like to handle that JIRA, if 
you are OK with it and demonstrate the points mentioned above.

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities 
> then REST call gives first/last 100 entities. How to retrieve next set of 100 
> entities i.e 101 to 200 OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is 
> no way to achieve this. 
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> Since ATS is targeting large number of entities storage, it is very common 
> use case to get next set of entities using fromId rather than querying all 
> the entites. This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-15 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15494251#comment-15494251
 ] 

Varun Saxena commented on YARN-5585:


Just to summarise the suggestions given for folks to refer to.

* Applications (like Tez) would know best how to interpret their entity IDs' 
and how they can be descendingly sorted. Most entity IDs' seem to have some 
sort of monotonically increasing sequence like app ID. We can hence open up a 
PUBLIC interface which ATSv2 users like Tez can implement to decide how to 
encode and decode a particular entity type so that it is stored in descending 
sorted fashion (based on creation time) in ATSv2. Encoding and decoding similar 
to AppIDConverter written in our code.Because if row keys themselves can be 
sorted, this will be performance wise the best possible solution. Refer to 
[comment | 
https://issues.apache.org/jira/browse/YARN-5585?focusedCommentId=15470803=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15470803]
** _Pros of the approach:_ 
**# Lookup will be fast.
** _Cons of the approach:_ 
**# We are depending on application to provide some code for this to work. 
Corresponding JAR will have to be placed in classpath. Folks in other projects 
may not be pleased to not have inbuilt support for this in ATS.
**# Entity IDs' may not always have a monotonically increasing sequence like 
App IDs'.

* We can keep another table, say EntityCreationTable or EntityIndexTable with 
row key as {{cluster!user!flow!flowrun!app!entitytype!reverse entity creation 
time!entityid}}. We will make an entry into this table whenever created time is 
reported for the entity. The real data would still reside in the main entity 
table. Entities in this table will be sorted descendingly. On read side, we can 
first peek into this table to get relevant records in descending fashion (based 
on limit and/or fromId) and then use this info to query entity table. We can do 
this in two ways. We can get created times from querying this index table and 
apply a filter of created time range. Or alternatively we can try out 
MultiRowRangeFilter. That from javadoc of HBase seems to be efficient. We will 
have to do some processing to determine these multiple row key ranges.  Refer 
to [comment | 
https://issues.apache.org/jira/browse/YARN-5585?focusedCommentId=15472669=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15472669]
** _Note:_  Client should not send different created times for the same entity 
otherwise that will lead to an additional row.  If different created time would 
be reported more than once we will have to consider the latest one.
** _Pros of the approach:_ 
**# Solution provided within ATS.
**# Extra write only when created time is reported.
** _Cons of the approach:_ 
**# Extra peek into the index table on the read side. Single entity read can 
still be served directly from entity table though.

* Another option would be to change the row key of entity table to 
cluster!user!flow!flowrun!app!entitytype!reverse entity creation time!entityid 
and have another table to map cluster!user!flow!flowrun!app!entitytype!entityid 
to entity created time.
So for a single entity call (HBase Get) we will have to first peek into the new 
table and then get records from entity table.
** _Cons of the approach:_ 
**# On write side, we will have to first lookup into the index table which has 
the entity created time or on every write client should supply entity created 
time. First would impact write performance and latter may not be feasible for 
client to send.
**# What should be the row key if client does not supply created time on first 
write but supplies the created time on a subsequent write.

cc [~sjlee0], [~vrushalic], [~rohithsharma], [~gtCarrera9]

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities 
> then REST call gives first/last 100 entities. How to retrieve next set of 100 
> entities i.e 101 to 200 OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is 
> no way to achieve this. 
> So proposal is to have fromId in the filter like 
> 

[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-08 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472946#comment-15472946
 ] 

Varun Saxena commented on YARN-5585:


But padding isnt required if we do not store DAG ID or vertex ID as string. And 
instead store it as long + int + int encoded byte array. Anyways this only 
comes into picture if we adopt the option of opening a PUBLIC interface which 
apps like Tez implement for entity IDs' which can be potentially ordered in the 
same order as that of its creation (this may not be true of Tez entities though 
either).

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-10. But to retrieve next 5 apps, it is 
> difficult.
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-08 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472939#comment-15472939
 ] 

Rohith Sharma K S commented on YARN-5585:
-

Bumping up the priority of the task since it is major drawback in ATSv2. 

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-10. But to retrieve next 5 apps, it is 
> difficult.
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-08 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472922#comment-15472922
 ] 

Rohith Sharma K S commented on YARN-5585:
-

It is right, there is no proper padding for DAG entities. Sample output for TEZ 
is 
{noformat}
[
  {
"metrics": [],
"events": [],
"type": "TEZ_VERTEX_ID",
"id": "tez_vertex_1471931266232_0008_1_00",
"createdtime": 1471939605434,
"info": {
  "UID": 
"yarn-cluster!application_1471931266232_0008!TEZ_VERTEX_ID!tez_vertex_1471931266232_0008_1_00"
},
"configs": {},
"isrelatedto": {},
"relatesto": {}
  },
  {
"metrics": [],
"events": [],
"type": "TEZ_VERTEX_ID",
"id": "tez_vertex_1471931266232_0008_1_02",
"createdtime": 1471939605414,
"info": {
  "UID": 
"yarn-cluster!application_1471931266232_0008!TEZ_VERTEX_ID!tez_vertex_1471931266232_0008_1_02"
},
"configs": {},
"isrelatedto": {},
"relatesto": {}
  },
  {
"metrics": [],
"events": [],
"type": "TEZ_VERTEX_ID",
"id": "tez_vertex_1471931266232_0008_1_01",
"createdtime": 1471939605405,
"info": {
  "UID": 
"yarn-cluster!application_1471931266232_0008!TEZ_VERTEX_ID!tez_vertex_1471931266232_0008_1_01"
},
"configs": {},
"isrelatedto": {},
"relatesto": {}
  }
]
{noformat}

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-10. But to retrieve next 5 apps, it is 
> difficult.
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-07 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472837#comment-15472837
 ] 

Varun Saxena commented on YARN-5585:


IIUC, Tez DAG ID is a combination of YARN App ID and DAG sequence ID.
Isnt this DAG sequence ID monotonically increasing and assigned to DAGs' as 
they are run in sequence ?
I was assuming they were. That is why I suggested storing DAG ID as 16 bytes (8 
bytes of inverted cluster timsetamp from app id +  4 bytes of inverted seq id 
from app id + 4 bytes of inverted DAG seq number). Padding in this case wont be 
required.

Anyways other solutions have been proposed and we can come back to this only if 
necessary.
Or maybe we can have both above solution and below one as well.

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-10. But to retrieve next 5 apps, it is 
> difficult.
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-07 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472689#comment-15472689
 ] 

Varun Saxena commented on YARN-5585:


Also [~rohithsharma] if its feasible, kindly consolidate all the use cases of 
Tez (from ATS perspective) and send out a mail to ATS team so that we can have 
further discussion on it with everyone in the team.

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-10. But to retrieve next 5 apps, it is 
> difficult.
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-07 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472676#comment-15472676
 ] 

Varun Saxena commented on YARN-5585:


Another option would be to change the row key of entity table to 
{{cluster!user!flow!flowrun!app!entitytype!reverse entity creation 
time!entityid}} and have another table to map 
{{cluster!user!flow!flowrun!app!entitytype!entityid}} to entity created time.
So for a single entity call (HBase Get) we will have to first peek into the new 
table and then get records from entity table.

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-10. But to retrieve next 5 apps, it is 
> difficult.
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-07 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472669#comment-15472669
 ] 

Varun Saxena commented on YARN-5585:


Another solution which comes to mind is that we keep another table, say 
EntityCreationTable with row key 
{{cluster!user!flow!flowrun!app!entitytype!reverse entity creation 
time!entityid}}. So we will make an entry into this table  whenever created 
time is reported for the entity. The real data would still reside in the main 
entity table. Entities in this table will be sorted descendingly 

And as the goal is to achieve pagination, we can introduce something like 
fromCreatedTime query param.
The pagination use case will be to get chunks of data. Let us say we want first 
10 records. In this case, we will send a query with limit of 10 and no 
fromCreatedTime query param.
So when a query arrives and fromCreatedTime is not there, we start reading from 
this table with start row as {{cluster!user!flow!flowrun!app!entitytype!}} upto 
number of records specified by {{limit}} query param.  We can break as soon as 
10 records are found and need not parse through all rows as is done right now 
for entity table.

Now if what we want is to return only the default view of the entity i.e. 
entity id, type and created time we can return a result set straight away. 
Otherwise, to get more detailed data, we need to get hold of first entity and 
last entity retrieved from EntityCreationTable and make a scan to EntityTable 
with Single Column Value filter with a created time range (the code is already 
there for this). 

This would still require full scan within the scope of entity type but most 
results will be removed by HBase at server end itself because of created time 
range filter. Which approach will be better. Directly dipping into Entity Table 
or querying 2 tables depends entirely on how many records we have in entity 
table within the scope of that entity type.

Now once, client gets a first 10 records, it can make next query to get record 
11-20 by populating fromCreatedTime with created time of 10th record. Next scan 
in EntityCreationTable can be made on the basis of that. fromId must also be 
used in conjunction with fromCreatedTime though.

For this solution client must not report duplicate created time multiple times.

Also not a 100% sure but a coprocessor can be used for this extra call ? So 
that client is not involved.

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-10. But to retrieve next 5 apps, it is 
> difficult.
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-07 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472636#comment-15472636
 ] 

Li Lu commented on YARN-5585:
-

bq. DAG ID seems to be generated same way.
Unfortunately this is not true for Tez... There is no proper padding for the 
DAG number, so we cannot do the pagination by the entity ID itself... 

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-10. But to retrieve next 5 apps, it is 
> difficult.
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-07 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472526#comment-15472526
 ] 

Varun Saxena commented on YARN-5585:


bq. Just realized that a normal converter will not address the use case where 
users really want entities sorted by their creation time
Yes, this is just for use cases where entity IDs' are structured in a manner 
where there is direct correlation between entity ID and being sorted by 
creation time. DAG ID seems to be generated same way. And seems to be the case 
in Spark too. If row keys are sorted that will be the best solution from a 
performance perspective. However, there is that disadvantage of putting the 
burden of it on the application to write some extra code and make sure that 
JARs' are placed during deployment. That is why I asked if this will be 
acceptable to Tez.
On further thought though, we can also break ATS related behavior for the 
application if they do choose to change their IDs' in a manner where its no 
longer sorted in future, however unlikely that may be. 

Let me think of something else then.

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-10. But to retrieve next 5 apps, it is 
> difficult.
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-07 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472111#comment-15472111
 ] 

Li Lu commented on YARN-5585:
-

Just realized that a normal converter will not address the use case where users 
really want entities sorted by their creation time, unless we introduce a 
second table to index those data... 

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-10. But to retrieve next 5 apps, it is 
> difficult.
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-07 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472083#comment-15472083
 ] 

Li Lu commented on YARN-5585:
-

I think we're overcomplicating the problem here... I believe the general use 
case of this JIRA is mostly on pagination: given an uniquely defined type of 
entities in one application, if the total number of entities is greater than 
the given limit, can we provide an API to allow fetching data in multiple 
batches. So right now we have , , ..., , 
and limit = 10. What we want is initially we fetch  to 
, then given fromId = entity_010, we fetch  to 
, and so on and so forth. According to Rohith's use case, I think 
it's totally fine to say that all entities are ordered by their Ids 
lexicographically (especially for entities with proper padding on numbers like 
container id). Actually, any consistent order will do the work for pagination, 
the only problem is how to make it makes sense to the users. 

The real problem here is we need to return everything in an order sorted by 
their creation time, which seems to be quite hard in our current data model. 
This was pretty easy in ATS v1, where creation time is baked in the row key for 
each entity. I remember there were some discussions about this a while ago, but 
the general conclusion was that we mainly rely on the use cases themselves to 
guarantee consistency between creation time and entity id. To me, the potential 
problem of sorting entities according to their creation time to implement 
pagination is that we have to firstly fetch _all_ of them from HBase to form 
the order, which really kills the most advantage of pagination. 

An ID encoder/decoder will be very helpful to this use case. However, having 
the application write the encode/decode process seems to be introducing more 
load to application programmers. It also introduces extra work for deployments 
since cluster operators need to handle third-party plugins. Can we provide 
several "SORT BY" options for timeline entity types, so that we store their ids 
accordingly? 

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-10. But to retrieve next 5 apps, it is 
> difficult.
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-07 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15470803#comment-15470803
 ] 

Varun Saxena commented on YARN-5585:


The best solution for this will be that keys are stored in sorted order in 
Timeline service.
But generic entity in ATSv2 terms can be anything so for us to specify a well 
defined generic behaviour for every entity under the sun would be impossible. 
Only applications which are users of ATSv2 will be sure of what entity ID means 
for them. For Tez, it maybe DAG or Vertex ID, for Spark it maybe task ID and so 
on.

But then as your use case suggests ATSv2 just brushing off requests of users 
may not be very good as it might be useful to fetch even generic entities in 
sorted order for users of ATSv2.

So how about we provide a PUBLIC interface which ATSv2 users like Tez can 
implement to decide how to encode and decode a particular entity type so that 
it is stored in sorted fashion in ATSv2 ? Say something like an 
EntityIDConverter interface with encode function (takes a String and outputs an 
encoded byte array) and decode function (takes byte array and converts into its 
String equivalent).
ATSv2 can have a configuration which contains list of converters against entity 
types. Something like {{:, :...}}
All the collectors and readers can load these implementations if they exist in 
their classpath.
If implementation does not exist, default behavior specified above can be 
adopted. Or we can just carry out a full scan within the scope of entity type.
A DAG ID for instance consists of an AppID and 4 bytes of DAG seq number. So we 
can write an encode function which outputs a 16 bytes byte  array with 8 bytes 
of inverted cluster timestamp in AppID, 4 bytes of inverted sequence number (in 
App ID) and 4 bytes of inverted DAG seq number. This will ensure DAGs' are 
stored in descendingly sorted fashion. Such implementation for instance can be 
provided by Tez.

Will such a solution be acceptable to Tez ?

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-10. But to retrieve next 5 apps, it is 
> difficult.
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-07 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15470739#comment-15470739
 ] 

Varun Saxena commented on YARN-5585:


bq. But problem is entities may not stored in their creation time order.  Say, 
if entities are stored in a row    then the above approach would fail. Because say fromId=Entity-2 expecting 
to get Entity-3 but createtime is T3 which empty rows are retrieved from HBase.
Not really. From the description of fromId in ATSv1, "If fromId is not null, 
retrieve entities earlier than and including the specified ID. If no start time 
is found for the specified ID, an empty list of entities will be returned. ". 
We wont be specifying entity id in row key. I am not suggesting to specify this 
as stop row. So its not as if  we will start or stop at a specific row.
The solution suggested above still means we will parse each and every entity 
row within the scope of entity type and compare against created time.

This is not a great solution because we are still having a large scan but 
better than fetching everything back to client. I will propose some other 
solution.


> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-10. But to retrieve next 5 apps, it is 
> difficult.
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-07 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15470687#comment-15470687
 ] 

Rohith Sharma K S commented on YARN-5585:
-

bq. fetch the specific entity mentioned in fromId using a HBase Get and get 
created time field for that entity ID.
Right, this was the first approach considered. And Li lu also suggested the 
same approach similar to ATS1.5 does. But problem is entities may not stored in 
their creation time order. Say, if entities are stored in a row *   * then the above approach would fail. 
Because say fromId=Entity-2 expecting to get Entity-3 but createtime is T3 
which empty rows are retrieved from HBase.  

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-10. But to retrieve next 5 apps, it is 
> difficult.
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-07 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15470657#comment-15470657
 ] 

Varun Saxena commented on YARN-5585:


Hmm...Then what we can do is that fetch the specific entity mentioned in fromId 
using a HBase Get and get created time field for that entity ID.
And then create a Scan with a Single Column Value filter to fetch entities with 
created time less than this entity. This however would still lead to all the 
entities within the scope of entity type being iterated over. However, the 
result set returned to client will be relatively smaller due to created time 
based single column value filter.

Let me think more over it and see if there could be a better solution.

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-10. But to retrieve next 5 apps, it is 
> difficult.
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-07 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15470538#comment-15470538
 ] 

Rohith Sharma K S commented on YARN-5585:
-

bq. Can you tell me the use case ? Listing all DAGs' or listing DAGs' within an 
app ? Or something else ? Typically how many DAGs' can there be per app ?
Basic use case is to achieve pagination nevertheless DAG or containers or any 
other user entities. Currently limit is 100 for any entities to retrieve. Say 
if number of entities is 200. Then REST call retrieves 100 entities. And how to 
retrieve 100 to 200 entities?

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-10. But to retrieve next 5 apps, it is 
> difficult.
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-07 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15470469#comment-15470469
 ] 

Varun Saxena commented on YARN-5585:


Thanks Rohith for the patch.
A single column value filter of CompareOp.LESS wont work.
This is because comparison will be lexicographic (as HBase BinaryComparator is 
used)
Assume the intention here is to fetch DAG IDs'. DAG ID is of the form 
dag_14567890123_0049_11. Now ideally dag_14567890123_0049_6 would be lesser 
than dag_14567890123_0049_11 but when we use SingleColumnValueFilter with 
BinaryComparator, dag_14567890123_0049_6 will be considered greater, which is 
wrong.

Can you tell me the use case ? Listing all DAGs' or listing DAGs' within an app 
? Or something else ? Typically how many DAGs' can there be per app ?
Based on that, solution can be thought of. Because till now I was thinking that 
fromId was required for apps.

Moreover, a couple of nits:
* Single column value filter need not be wrapped inside another filter list.
* createSingleColValueFilters => createSingleColValueFilter

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-10. But to retrieve next 5 apps, it is 
> difficult.
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-07 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15470338#comment-15470338
 ] 

Rohith Sharma K S commented on YARN-5585:
-

One of the challenge for supporting fromId is to iterate in forward or 
backward? This dilemma is mainly due to TimelineEntity#compareTo implementation 
where in sorted in descending order if creation time is null otherwise sorted 
in ascending order of ID. 
Say entities from entity-1 to entity-10 then 
# If entities stored without creationtime, then getEntities has output 
ascending order of ID i.e enitity-1 to entity-10.
# If entities stored with creationtime, then getEntities has output descending 
order of ID i.e enitity-10 to entity-1.

When limit and fromId is applied on both the cases, the expected output changes 
respectively. 
Say limit=5 and fromId=entity-5 then in first case, user expect entity-6 to 
entity-10 because user has already got entity-1 to entity-5 on previous call. 
In second case, user expect entity-4 to entity-1. This is tedious task to 
differentiate between forward or backward iteration from storage level. 

I think anyway this can not be achieved from storage level because of async 
storage of entities. And this can be achieved at TimelineEntityReader having 
another loop. Cons is this would lead to another loop in the code. 

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-10. But to retrieve next 5 apps, it is 
> difficult.
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-07 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15469802#comment-15469802
 ] 

Rohith Sharma K S commented on YARN-5585:
-

bq. Are we selecting entities whose ID is less than start value, or we're 
filtering them out? According to your description fromId = app-5 should return 
something like app-6 to 10, right? I think it's very important to clearly 
define the exact meaning of "fromId"?
*fromId* is to the users to pass as an query parameter in REST URL similar to 
limit.  When entities are being retrieved from storage i.e HBase, entities 
whose ID is less than start value are given to HBase client. Then HBase client 
process this ResultScanner and return entites. 
Ex : Assume that *entity-1 entity-2.. entity-10* are stored in HBase in a row. 
Current Behavior without fromId : 
# When REST call is made to obtaining entities , then out put get it as 
*entity-10 entity-9... entity-2, entity-1*. 
# When REST call is made along with filter {{limit=5}}, then out put get it as 
*entity-10, entity-9... entity-6*.  Note that limit is not applied at storage 
level.  Rather limit is applied on scanned rows i.e HBase ResultScanner gives 
*ALL* the rows i.e entities1 to entities-10. And  
{{TimelineEntityReader#readEntities}} limit number of rows to be given to user. 

After patch i.e fromId as filter : 
# When REST call is made along with filter {{limit=5}} and 
{{fromIid=entity-6}}, then *HBase it self gives rows which are less than 
entity-6* i.e entity-5 to entity-1. It is much more optimization rather that 
processing all the rows at HBaseclient i.e at 
{{TimelineEntityReader#readEntities}}

Basically to the user, fromId is nothing but starting point for next set of 
entities.

bq. Because we're selecting entities starting from a given ID, can we directly 
pass in the fromID's key when creating the scan? In this way seems like we 
saved one filter? For example, if fromId is not provided, we may want to scan 
from cluster!user!flow!flowrun!appId!type, but if fromId is provided, we can 
start from cluster!user!flow!flowrun!appId!type!fromId (or the next available 
entity)?
This is good point. But as you said in earlier comment that entities are not 
stored in-order. It can be like 
entites-9,entitis-5,entites-6,entites-2...entities-10. So, IIUC this can not be 
achieved

bq. For pagination on containers, why do we need to care about actual creation 
time when the entity ids have already been sorted? This said, supporting 
paginations for generic timeline entities should not be blocked by YARN-5094?
Any entities with creationTime set will get descending order of entityId. If 
creationtime is not set than there result is reverse order i.e ascending order 
of entityId. This is because of implementation of 
{{TimelineEntitiy#compareTo}}. So, say {{limit=2 and fromId=enitytId-6}} then 
from storage rows retrieved are i.e entity-5 to entity-1. And to the user, REST 
output get as entity-1 and entity-2 rather than getting entity-5 and entity-4.  
This is because of {{TimelineEntityReader#readEntities}} implementation.  
YARN-5094 blocks for testing YARN-CONTAINER entities because most of the events 
are -1 creation time which always result will be first N number of containers 
when fromId is used. I have tested for TEZ application where fromId works right 
way. 


> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-10. But to retrieve next 5 apps, it is 
> difficult.
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-06 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15469513#comment-15469513
 ] 

Li Lu commented on YARN-5585:
-

Thanks [~rohithsharma]! The approach is generally fine, however, I have some 
confusions:
1. I'm a little bit confused by the usage of Hbase filters here. When 
performing fromId, we create a filter like this:
{code}
Filter singleColValFilterStart = createHBaseSingleColValueFilter(
column.getColumnFamilyBytes(), column.getColumnQualifierBytes(),
column.getValueConverter().encodeValue(startValue),
CompareOp.LESS, true);
{code}
Are we selecting entities whose ID is less than start value, or we're filtering 
them out? According to your description fromId = app-5 should return something 
like app-6 to 10, right? I think it's very important to clearly define the 
exact meaning of "fromId"? 

2. Because we're selecting entities starting from a given ID, can we directly 
pass in the fromID's key when creating the scan? In this way seems like we 
saved one filter? For example, if fromId is not provided, we may want to scan 
from cluster!user!flow!flowrun!appId!type, but if fromId is provided, we can 
start from cluster!user!flow!flowrun!appId!type!fromId (or the next available 
entity)? 

3. For pagination on containers, why do we need to care about actual creation 
time when the entity ids have already been sorted? This said, supporting 
paginations for generic timeline entities should not be blocked by YARN-5094? 

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-10. But to retrieve next 5 apps, it is 
> difficult.
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15468719#comment-15468719
 ] 

Hadoop QA commented on YARN-5585:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
27s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 19s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
31s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
18s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 16s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 16s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 11s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice:
 The patch generated 2 new + 21 unchanged - 1 fixed = 23 total (was 22) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
35s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 13s 
{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-timelineservice
 generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 46s 
{color} | {color:green} hadoop-yarn-server-timelineservice in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
15s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 13m 36s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12827175/YARN-5585.v0.patch |
| JIRA Issue | YARN-5585 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux a585c24a19a2 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 5f23abf |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/13020/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-timelineservice.txt
 |
| javadoc | 
https://builds.apache.org/job/PreCommit-YARN-Build/13020/artifact/patchprocess/diff-javadoc-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-timelineservice.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/13020/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice
 U: 

[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-06 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15467275#comment-15467275
 ] 

Rohith Sharma K S commented on YARN-5585:
-

YARN-5094 has to do some progress to check this patch workability with 
yarn-containers.  otherwise output will go in toss. For testing, may be can 
check via other entities which are properly set created time. 

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-10. But to retrieve next 5 apps, it is 
> difficult.
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-06 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15467262#comment-15467262
 ] 

Rohith Sharma K S commented on YARN-5585:
-

YARN web UI use case : Number of containers running in a cluster is totally up 
to the applications. Yarn Web UI wants to display all the containers running 
for an application.
ATSv2 REST end pints provides API to retrieve entities with entity type 
YARN_CONATAINER. But issue is that its limit is 100. Say if number of 
containers run for an applications is 500. In this case, REST endpoint always 
gives last 100 entities i.e last 100 containers run i.e 400 to 500. How do I 
retrieve containers with id from 300-400 or 200-300? This is basically for 
pagination support where in REST call will be done with limit and fromId query 
parameters. Once the REST is called with fromId=conatainer_400 then server 
should return 300 to 400 container list. 


> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-10. But to retrieve next 5 apps, it is 
> difficult.
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-01 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15457081#comment-15457081
 ] 

Li Lu commented on YARN-5585:
-

Thanks [~varun_saxena]! I think the discussion so far indicates that 
implementing fromId on containers seems to be hard? [~rohithsharma] could you 
please verify if Varun's current plan works for your use case? I'm trying to 
get a big picture for the use cases of fromId. Thanks! 

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-10. But to retrieve next 5 apps, it is 
> difficult.
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-01 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15455180#comment-15455180
 ] 

Varun Saxena commented on YARN-5585:


bq. But I am not sure that does HBase filters can support to scan the rows 
which are less than or greater than id's.
As I indicated in my comment above, this can be easily done in HBase by setting 
start row in HBase Scan. Limit (use Page Filter for it) will anyways limit the 
number of entities returned.

However this would work well only for apps within a flow run and wont work well 
with apps within a flow as 2 flow runs may be executing simultaneously.


> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-10. But to retrieve next 5 apps, it is 
> difficult.
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-08-31 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15454227#comment-15454227
 ] 

Rohith Sharma K S commented on YARN-5585:
-

bq. Can we translate the fromId request into some HBase filters so that we can 
process this request on the storage layer?
Ultimately this will be better way to do rather than at TimelineReader API 
level. But I am not sure that does HBase filters can support to scan the rows 
which are less than or greater than id's. I will have look at this with high 
priority. This make more sense to me. 

bq. but note that this requires some in-memory operation to actually sort all 
entities, but not only read part of them out from the storage?
This is current behavior. Already entities are sorted using 
TimelineClient#compareTo in {{TimelineEntityReader#readEntities}}.  Another 
loop on sorted entities required to achieve this.

bq. he problem with this kind of an approach is that new apps keep on getting 
added so result may not be latest.
It is fine to me. I agree with Lilu.



> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-10. But to retrieve next 5 apps, it is 
> difficult.
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-08-31 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15453314#comment-15453314
 ] 

Li Lu commented on YARN-5585:
-

If there are two flow runs running, I believe the problem is how to define the 
meaning of "fromId". This appears to be something requires working with 
"aggregated" data on one flow, instead of directly working on data with 
hierarchical order. IIUC the ultimate goal in this JIRA is to support 
pagination, so I think it might be helpful to fully understand important use 
cases here. 

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-10. But to retrieve next 5 apps, it is 
> difficult.
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-08-31 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15453145#comment-15453145
 ] 

Varun Saxena commented on YARN-5585:


Infact in case of flows within an app, there can be a problem with approach 
above if we have 2 or more flow runs executing simultaneously.

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-10. But to retrieve next 5 apps, it is 
> difficult.
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-08-31 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15453147#comment-15453147
 ] 

Varun Saxena commented on YARN-5585:


Infact in case of flows within an app, there can be a problem with approach 
above if we have 2 or more flow runs executing simultaneously.

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-10. But to retrieve next 5 apps, it is 
> difficult.
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-08-31 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15453109#comment-15453109
 ] 

Varun Saxena commented on YARN-5585:


bq. Can we translate the fromId request into some HBase filters
Yes, for fetching apps within flow run, we can set the start row in HBase scan 
to achieve this. Application ID part in Application table rowkey is stored as 
12 bytes (inverted cluster timestamp of 8 bytes and inverted sequence number of 
4 bytes). So within the scope of flow run, we can set fromId as application ID 
bit while specifying start row in HBase scan.

For getting apps within a flow, in addition to app id (received from fromId), 
we can specify flow run id as inverted value of max value of long i.e. 0. And 
set this as start row in HBase scan. This would require comparatively more 
matches but should be fine as we will doing row key prefix match.

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-10. But to retrieve next 5 apps, it is 
> difficult.
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-08-31 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15453070#comment-15453070
 ] 

Li Lu commented on YARN-5585:
-

Can we translate the fromId request into some HBase filters so that we can 
process this request on the storage layer? I agree with [~varun_saxena] that 
supporting fromId for containers may be different. Containers are not top-level 
concept for timeline service, so unless there is a strong enough reason, I'd 
incline to not to introduce a separate for containers. 

bq. But once rows are retrieved from HBase, it is sorted as 
TimelineEntity#compareTo provided. 
We can certainly do this, but note that this requires some in-memory operation 
to actually sort all entities, but not only read part of them out from the 
storage? 

bq. However, the problem with this kind of an approach is that new apps keep on 
getting added so result may not be latest.
I'm fine if the results are not the "latest". Once the system behaves in a 
linearizable fashion (results are consistent according to time) we're fine. 


> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-10. But to retrieve next 5 apps, it is 
> difficult.
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-08-31 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15452583#comment-15452583
 ] 

Varun Saxena commented on YARN-5585:


If the use case is only for apps then the row keys in application table are 
stored in sorted manner (in descending order) within the scope of a flow / flow 
run.
And we can easily support fromId alongwith limit to achieve some sort of 
pagination here without any performance penalty.

However, the problem with this kind of an approach is that new apps keep on 
getting added so result may not be latest. For instance, if there are 100 apps 
app100-app1 in ATS and we show 10 apps on each page. Then, if we move to page 3 
we will show apps from app80-app71 but it is possible that say 5 more apps get 
added in the meantime i.e. we not have app105 to app1 in ATS.
Ideally page 3 should then show app85-app76.

Entities in entity table though are not sorted because entity could be anything.
If we have a similar use case for containers, we can consider separating it out 
to a different table and have special handling for it. But there should be a 
use case for it.

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-10. But to retrieve next 5 apps, it is 
> difficult.
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-08-31 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15451835#comment-15451835
 ] 

Rohith Sharma K S commented on YARN-5585:
-

bq. entities may not stored in their creation time order (although may quite 
possibly be so). Therefore this is slightly different to the fromId in ATS v1, 
where it will "retrieve entities earlier than and including the specified ID".
Right, entities are not stored in creation time order. But once rows are 
retrieved from HBase, it is sorted as TimelineEntity#compareTo provided. 
Say rows fetched are in format  like 
HBase stored entities in a row and fetched in the order    
  
then after sorting      are returned. So I 
think behavior remain same. Here, fromId=e3 then e2,e1 can be returned. 

I think I see one potential issue because of *entities may not stored in their 
creation time order* is filters *createdtimestart* and *createdtimeend* might 
not work properly. Say if these two filter are set as c1 and c5 respectively 
then HBase will return empty result scanner I think, since c1 comes after c5 in 
a row. This behavior need to confirm from HBase end. To test this behavior 
YARN-5094 blocks:-(

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-10. But to retrieve next 5 apps, it is 
> difficult.
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-08-30 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450450#comment-15450450
 ] 

Li Lu commented on YARN-5585:
-

OK let me make sure if I fully understand it here: the ultimate use case is 
pagination. To support this, we need our APIs to not only have number limits, 
but also the start position of the returned values. One thing to note is that 
in ATS v2, entities may not stored in their creation time order (although may 
quite possibly be so). Therefore this is slightly different to the fromId in 
ATS v1, where it will "retrieve entities earlier than and including the 
specified ID". I'm fine with the proposal to add this feature but we may want 
to note users the difference between two versions of timeline APIs. Am I 
missing something here? 

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-10. But to retrieve next 5 apps, it is 
> difficult.
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org