[kudu-CR] WIP KUDU-16 pt 3: add per-scan-token limits

2020-06-22 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9923 )

Change subject: WIP KUDU-16 pt 3: add per-scan-token limits
..


Patch Set 1:

Revisiting this. I was a bit surprised to see the limit used in the Java 
client, but not in the C++ client. Given the limit is already defined and used 
on the scan token it seams like plumbing it through cant hurt and would make 
the client implementations more consistent.


--
To view, visit http://gerrit.cloudera.org:8080/9923
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I655f07f10a9a99e9766402729361de97e929136a
Gerrit-Change-Number: 9923
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Mon, 22 Jun 2020 15:23:51 +
Gerrit-HasComments: No


[kudu-CR] WIP KUDU-16 pt 3: add per-scan-token limits

2018-04-09 Thread Andrew Wong (Code Review)
Andrew Wong has abandoned this change. ( http://gerrit.cloudera.org:8080/9923 )

Change subject: WIP KUDU-16 pt 3: add per-scan-token limits
..


Abandoned

Deemed not that useful
--
To view, visit http://gerrit.cloudera.org:8080/9923
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: I655f07f10a9a99e9766402729361de97e929136a
Gerrit-Change-Number: 9923
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] WIP KUDU-16 pt 3: add per-scan-token limits

2018-04-09 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9923 )

Change subject: WIP KUDU-16 pt 3: add per-scan-token limits
..


Patch Set 1:

> Patch Set 1:
>
> (1 comment)

After some discussion with Thomas from the Impala team, I think the consensus 
is that this API isn't that useful. Right now, Impala can't really take 
advantage of this top-level LimitPerToken API because the scan limit isn't 
known at the time of token building (although there are likely ways around this 
through updating Impala's source).

The ideal API would be one where a limit could be set per grouop of tokens and 
the tokens would be able to coordinate with each other to satisfy the exact 
limit. This is difficult to do, so it's probably off the table for now.

That said, the limit is definitely known during the re-hydrating call 
DeserializeIntoScanner(), so Impala should be able to use the API defined in 
[725c6ce6391b42331e4ab4210ae4117779cba4a2](https://github.com/apache/kudu/commit/725c6ce6391b42331e4ab4210ae4117779cba4a2)
 in a similar way to that in this patch, i.e. specifying the overall limit per 
scanner such that l * t rows are returned across all scanners and returning the 
first l, since this is still a perf win.


--
To view, visit http://gerrit.cloudera.org:8080/9923
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I655f07f10a9a99e9766402729361de97e929136a
Gerrit-Change-Number: 9923
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Mon, 09 Apr 2018 21:43:58 +
Gerrit-HasComments: No


[kudu-CR] WIP KUDU-16 pt 3: add per-scan-token limits

2018-04-06 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9923 )

Change subject: WIP KUDU-16 pt 3: add per-scan-token limits
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9923/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9923/1//COMMIT_MSG@19
PS1, Line 19: sure this is actually a useful API or the one that we want to 
expose.
> We chatted about this on Slack yesterday, but my thoughts are that we shoul
I agree with both of your points; if a token is a logical concept that 
represents some optimization on a query plan, limiting probably doesn't make 
much sense (unless we're doing something fancy like sampling and joining, but 
AFAIK we're [Impala/Spark] not). That said, as an optimization for how Impala 
or some other parallel-scanning service might handles limits, it still might be 
a convenient API to have.

I took a look at Spark and it seems like limiting is handled at a different 
layer of abstraction than the one at which KuduRDD sits (something like 
https://spark.apache.org/docs/1.2.0/api/java/org/apache/spark/sql/execution/Limit.html).
 I'll have to dig a bit more, still not sure how KuduRDD fits in.

As for Impala, yeah it'd be helpful to get some feedback from the Impala devs. 
I pinged Thomas since he worked on runtime filters and runtime limits seems in 
a similar ballpark. In the meantime, I've filed IMPALA-6821 and pointed it at 
this CR.



--
To view, visit http://gerrit.cloudera.org:8080/9923
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I655f07f10a9a99e9766402729361de97e929136a
Gerrit-Change-Number: 9923
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Sat, 07 Apr 2018 06:58:30 +
Gerrit-HasComments: Yes


[kudu-CR] WIP KUDU-16 pt 3: add per-scan-token limits

2018-04-05 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9923 )

Change subject: WIP KUDU-16 pt 3: add per-scan-token limits
..


Patch Set 1:

(1 comment)

Code LGTM, but I could go either way on whether to include this at all.

http://gerrit.cloudera.org:8080/#/c/9923/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9923/1//COMMIT_MSG@19
PS1, Line 19: sure this is actually a useful API or the one that we want to 
expose.
> yea, Im not sure this API provides much value beyond the caller just specif
We chatted about this on Slack yesterday, but my thoughts are that we should 
definitely support setting a limit after re-hydrating the token, but providing 
this API as well might be useful, and it seems like it will be low maintenance 
overhead.  I checked out the Spark integration to see if we could take 
advantage of it their, but it doesn't look like the RelationProvider interface 
exposes any kind of Limit hooks, so I guess I don't have any concrete examples 
where it will be useful.



--
To view, visit http://gerrit.cloudera.org:8080/9923
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I655f07f10a9a99e9766402729361de97e929136a
Gerrit-Change-Number: 9923
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Thu, 05 Apr 2018 18:09:53 +
Gerrit-HasComments: Yes


[kudu-CR] WIP KUDU-16 pt 3: add per-scan-token limits

2018-04-04 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9923 )

Change subject: WIP KUDU-16 pt 3: add per-scan-token limits
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9923/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9923/1//COMMIT_MSG@19
PS1, Line 19: sure this is actually a useful API or the one that we want to 
expose.
yea, Im not sure this API provides much value beyond the caller just specifying 
the limit when they hydrate the token before they call Open() on the scan, 
which has the same net effect.

In general I feel like the APIs we provide for scan tokens should be things 
that offer some optimization opportunity at that "planning" stage. For example, 
it's beneficial to express predicates because it may reduce the number of 
generated tokens based on partition pruning, but for something like limit, 
especially given the odd semantics that you described, it's less obvious. A 
further reason it might be better for the user to handle the limits is that, at 
least in the case of Impala, a single backend is responsible for processing 
several tokens (in parallel, even), and so they'll still need to do their own 
limit tracking there to early-stop scanning after the limit is exhausted.

Maybe worth chatting with the Impala folks to see whether they think they would 
prefer to use an API like this to attach the limit to the scan tokens or 
whether or would be more useful for them to manage the limit themselves?



--
To view, visit http://gerrit.cloudera.org:8080/9923
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I655f07f10a9a99e9766402729361de97e929136a
Gerrit-Change-Number: 9923
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Wed, 04 Apr 2018 19:23:10 +
Gerrit-HasComments: Yes


[kudu-CR] WIP KUDU-16 pt 3: add per-scan-token limits

2018-04-03 Thread Andrew Wong (Code Review)
Andrew Wong has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/9923


Change subject: WIP KUDU-16 pt 3: add per-scan-token limits
..

WIP KUDU-16 pt 3: add per-scan-token limits

This patch implements scan-limiting on a per-token basis. Kudu only
exposes a builder API for batches of tokens. These tokens get sent out
and instantiate scanners with any specified limit. This means that if
the builder limit is l, with t tokens, the number of rows returned will
be at most l * t. This is preferable to splitting the limit across
tokens, e.g. by limiting each token to l / t rows, as this may yield
fewer than l rows across tokens, when more than l rows actually exist,
which doesn't seem desirable.

WIP Maybe we should expose setting individual token limits. Also I'm not
sure this is actually a useful API or the one that we want to expose.

Change-Id: I655f07f10a9a99e9766402729361de97e929136a
---
M src/kudu/client/client.cc
M src/kudu/client/client.h
M src/kudu/client/scan_token-internal.cc
M src/kudu/client/scan_token-test.cc
4 files changed, 37 insertions(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/23/9923/1
--
To view, visit http://gerrit.cloudera.org:8080/9923
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I655f07f10a9a99e9766402729361de97e929136a
Gerrit-Change-Number: 9923
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong