[jira] [Comment Edited] (YARN-5739) Provide timeline reader API to list available timeline entity types for one application

2016-11-17 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15675693#comment-15675693
 ] 

Varun Saxena edited comment on YARN-5739 at 11/18/16 4:29 AM:
--

[~vrushalic], FirstKeyOnlyFilter will return the first KV from each row and 
KeyOnlyFilter only the key going by the description for each filter. So 
shouldn't KeyOnlyFilter be enough ? I had removed FirstKeyOnlyFilter and then 
ran the tests which Li had written and those passed.

bq. This filter is used to limit the number of results to a specific page size. 
So it will terminate the scanning once the number of filter-passed rows is > 
the given page size on that particular Region Server.
Which should be fine I guess. We apply limit (coming as a query param on reader 
side) using this filter on the reader side elsewhere as well. Because we only 
need one row. Even if we get one row per Region Server it will be a superset 
and once result set is created it will be sorted to ensure we get keys in order 
and we will fetch only the first one.
However setCaching should be fine in our use case. But not sure why we are not 
using it to apply limit and using PageFilter instead. Do you know pros and cons 
of one over other ?


was (Author: varun_saxena):
[~vrushalic], FirstKeyOnlyFilter will return the first KV from each row and 
KeyOnlyFilter only the key going by the description for each filter. So 
shouldn't KeyOnlyFilter be enough ?

bq. This filter is used to limit the number of results to a specific page size. 
So it will terminate the scanning once the number of filter-passed rows is > 
the given page size on that particular Region Server.
Which should be fine I guess. We apply limit (coming as a query param on reader 
side) using this filter on the reader side elsewhere as well. Because we only 
need one row. Even if we get one row per Region Server it will be a superset 
and once result set is created it will be sorted to ensure we get keys in order 
and we will fetch only the first one.
However setCaching should be fine in our use case. But not sure why we are not 
using it to apply limit and using PageFilter instead. Do you know pros and cons 
of one over other ?

> Provide timeline reader API to list available timeline entity types for one 
> application
> ---
>
> Key: YARN-5739
> URL: https://issues.apache.org/jira/browse/YARN-5739
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-5739-YARN-5355.001.patch, 
> YARN-5739-YARN-5355.002.patch
>
>
> Right now we only show a part of available timeline entity data in the new 
> YARN UI. However, some data (especially library specific data) are not 
> possible to be queried out by the web UI. It will be appealing for the UI to 
> provide an "entity browser" for each YARN application. Actually, simply 
> dumping out available timeline entities (with proper pagination, of course) 
> would be pretty helpful for UI users. 
> On timeline side, we're not far away from this goal. Right now I believe the 
> only thing missing is to list all available entity types within one 
> application. The challenge here is that we're not storing this data for each 
> application, but given this kind of call is relatively rare (compare to 
> writes and updates) we can perform some scanning during the read time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5739) Provide timeline reader API to list available timeline entity types for one application

2016-11-09 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15650551#comment-15650551
 ] 

Varun Saxena edited comment on YARN-5739 at 11/9/16 10:26 AM:
--

bq. WRT caching, I am wondering, if there might be a query coming next for the 
details of these entity types?
Caching is set per Scan. Right ? Anyways alternatively PageFilter can be used 
to retrieve only one record per RS from backend.


was (Author: varun_saxena):
bq. WRT caching, I am wondering, if there might be a query coming next for the 
details of these entity types?
Caching is set per Scan. Right ? Anyways alternatively PageFilter can be used 
to retrieve only one record from backend.

> Provide timeline reader API to list available timeline entity types for one 
> application
> ---
>
> Key: YARN-5739
> URL: https://issues.apache.org/jira/browse/YARN-5739
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-5739-YARN-5355.001.patch
>
>
> Right now we only show a part of available timeline entity data in the new 
> YARN UI. However, some data (especially library specific data) are not 
> possible to be queried out by the web UI. It will be appealing for the UI to 
> provide an "entity browser" for each YARN application. Actually, simply 
> dumping out available timeline entities (with proper pagination, of course) 
> would be pretty helpful for UI users. 
> On timeline side, we're not far away from this goal. Right now I believe the 
> only thing missing is to list all available entity types within one 
> application. The challenge here is that we're not storing this data for each 
> application, but given this kind of call is relatively rare (compare to 
> writes and updates) we can perform some scanning during the read time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5739) Provide timeline reader API to list available timeline entity types for one application

2016-11-08 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15649165#comment-15649165
 ] 

Vrushali C edited comment on YARN-5739 at 11/8/16 11:21 PM:


Thanks [~gtCarrera9] for the patch. I wish to add to [~varun_saxena]'s review 
suggestions.

- agree with suggestion to rename the rest endpoint, but I think we should not 
use "-" in the rest endpoint string. So perhaps something like {noformat} 
/apps/{appid}/entitytypes {noformat}
- Yes we need not set max versions at L159 in EntityTypeReader
- WRT caching, I am wondering, if there might be a query coming next for the 
details of these entity types?

For the scan/filter, I have a different suggestion:

Looks like we want to return only entity types. Entity types are part of row 
keys, we don't need the column qualifiers and values in that case. So we can 
consider using the KeyOnlyFilter filter.   This is a filter that will only 
return the key component of each KV (the value will be rewritten as empty). 
This filter can be used to grab all of the keys without having to also grab the 
values.  When performing a table scan where only the row keys are needed (no 
families, qualifiers, values or timestamps), to use this, add a FilterList with 
a MUST_PASS_ALL operator to the scanner using setFilter. The filter list should 
include both a FirstKeyOnlyFilter and a KeyOnlyFilter. 
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/KeyOnlyFilter.html






was (Author: vrushalic):
Thanks [~gtCarrera9] for the patch. I wish to add to [~varun_saxena]'s review 
suggestions.

- agree with suggestion to rename the rest endpoint, but I think we should not 
use "-" in the rest endpoint string. So perhaps something like {preformat} 
/apps/{appid}/entitytypes {preformat}
- Yes we need not set max versions at L159 in EntityTypeReader
- WRT caching, I am wondering, if there might be a query coming next for the 
details of these entity types?

For the scan/filter, I have a different suggestion:

Looks like we want to return only entity types. Entity types are part of row 
keys, we don't need the column qualifiers and values in that case. So we can 
consider using the KeyOnlyFilter filter.   This is a filter that will only 
return the key component of each KV (the value will be rewritten as empty). 
This filter can be used to grab all of the keys without having to also grab the 
values.  When performing a table scan where only the row keys are needed (no 
families, qualifiers, values or timestamps), to use this, add a FilterList with 
a MUST_PASS_ALL operator to the scanner using setFilter. The filter list should 
include both a FirstKeyOnlyFilter and a KeyOnlyFilter. 
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/KeyOnlyFilter.html





> Provide timeline reader API to list available timeline entity types for one 
> application
> ---
>
> Key: YARN-5739
> URL: https://issues.apache.org/jira/browse/YARN-5739
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-5739-YARN-5355.001.patch
>
>
> Right now we only show a part of available timeline entity data in the new 
> YARN UI. However, some data (especially library specific data) are not 
> possible to be queried out by the web UI. It will be appealing for the UI to 
> provide an "entity browser" for each YARN application. Actually, simply 
> dumping out available timeline entities (with proper pagination, of course) 
> would be pretty helpful for UI users. 
> On timeline side, we're not far away from this goal. Right now I believe the 
> only thing missing is to list all available entity types within one 
> application. The challenge here is that we're not storing this data for each 
> application, but given this kind of call is relatively rare (compare to 
> writes and updates) we can perform some scanning during the read time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5739) Provide timeline reader API to list available timeline entity types for one application

2016-11-08 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15649165#comment-15649165
 ] 

Vrushali C edited comment on YARN-5739 at 11/8/16 11:21 PM:


Thanks [~gtCarrera9] for the patch. I wish to add to [~varun_saxena]'s review 
suggestions.

- agree with suggestion to rename the rest endpoint, but I think we should not 
use "-" in the rest endpoint string. So perhaps something like {preformat} 
/apps/{appid}/entitytypes {preformat}
- Yes we need not set max versions at L159 in EntityTypeReader
- WRT caching, I am wondering, if there might be a query coming next for the 
details of these entity types?

For the scan/filter, I have a different suggestion:

Looks like we want to return only entity types. Entity types are part of row 
keys, we don't need the column qualifiers and values in that case. So we can 
consider using the KeyOnlyFilter filter.   This is a filter that will only 
return the key component of each KV (the value will be rewritten as empty). 
This filter can be used to grab all of the keys without having to also grab the 
values.  When performing a table scan where only the row keys are needed (no 
families, qualifiers, values or timestamps), to use this, add a FilterList with 
a MUST_PASS_ALL operator to the scanner using setFilter. The filter list should 
include both a FirstKeyOnlyFilter and a KeyOnlyFilter. 
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/KeyOnlyFilter.html






was (Author: vrushalic):
Thanks [~gtCarrera9] for the patch. I wish to add to [~varun_saxena]'s review 
suggestions.

- agree with suggestion to rename the rest endpoint, but I think we should not 
use "-" in the rest endpoint string. So perhaps something like {quote} 
/apps/{appid}/entitytypes {quote}
- Yes we need not set max versions at L159 in EntityTypeReader
- WRT caching, I am wondering, if there might be a query coming next for the 
details of these entity types?

For the scan/filter, I have a different suggestion:

Looks like we want to return only entity types. Entity types are part of row 
keys, we don't need the column qualifiers and values in that case. So we can 
consider using the KeyOnlyFilter filter.   This is a filter that will only 
return the key component of each KV (the value will be rewritten as empty). 
This filter can be used to grab all of the keys without having to also grab the 
values.  When performing a table scan where only the row keys are needed (no 
families, qualifiers, values or timestamps), to use this, add a FilterList with 
a MUST_PASS_ALL operator to the scanner using setFilter. The filter list should 
include both a FirstKeyOnlyFilter and a KeyOnlyFilter. 
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/KeyOnlyFilter.html





> Provide timeline reader API to list available timeline entity types for one 
> application
> ---
>
> Key: YARN-5739
> URL: https://issues.apache.org/jira/browse/YARN-5739
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-5739-YARN-5355.001.patch
>
>
> Right now we only show a part of available timeline entity data in the new 
> YARN UI. However, some data (especially library specific data) are not 
> possible to be queried out by the web UI. It will be appealing for the UI to 
> provide an "entity browser" for each YARN application. Actually, simply 
> dumping out available timeline entities (with proper pagination, of course) 
> would be pretty helpful for UI users. 
> On timeline side, we're not far away from this goal. Right now I believe the 
> only thing missing is to list all available entity types within one 
> application. The challenge here is that we're not storing this data for each 
> application, but given this kind of call is relatively rare (compare to 
> writes and updates) we can perform some scanning during the read time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org