[Impala-ASF-CR] IMPALA-3481: Use Kudu ScanToken API for scan ranges

2016-09-07 Thread Internal Jenkins (Code Review)
Internal Jenkins has submitted this change and it was merged.

Change subject: IMPALA-3481: Use Kudu ScanToken API for scan ranges
..


IMPALA-3481: Use Kudu ScanToken API for scan ranges

Switches the planner and KuduScanNode to use Kudu's new
ScanToken API instead of explicitly constructing scan ranges
for all tablets of a table, regardless of whether they were
needed. The ScanToken API allows Impala to specify the
projected columns and predicates during planning, and Kudu
returns a set of 'scan tokens' that represent a scanner for
each tablet that needs to be scanned. The scan tokens can
be serialized and distributed to the scan nodes, which can
then deserialize them into Kudu scanner objects. Upon
deserialization, the scan token has all scan parameters
already, including the 'pushed down' predicates. Impala no
longer needs to send the Kudu predicates to the BE and
convert them at the scan node.

This change also fixes:
1) IMPALA-4016: Avoid materializing slots only referenced
by Kudu conjuncts
2) IMPALA-3874: Predicates are not always pushed to Kudu

TODO: Consider additional planning improvements.

Testing: Updated the existing tests, verified everything
works as expected. Some BE tests no longer make sense and
they were removed.

TODO: When KUDU-1065 is resolved, add tests that demonstrate pruning.

Change-Id: I160e5849d372755748ff5ba3c90a4651c804b220
Reviewed-on: http://gerrit.cloudera.org:8080/4120
Reviewed-by: Matthew Jacobs 
Tested-by: Internal Jenkins
---
M be/src/exec/kudu-scan-node-test.cc
M be/src/exec/kudu-scan-node.cc
M be/src/exec/kudu-scan-node.h
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M be/src/exec/kudu-util.cc
M be/src/exec/kudu-util.h
M be/src/scheduling/simple-scheduler.cc
M bin/impala-config.sh
M common/thrift/PlanNodes.thrift
M fe/pom.xml
M fe/src/main/java/com/cloudera/impala/analysis/Expr.java
M fe/src/main/java/com/cloudera/impala/analysis/LiteralExpr.java
M fe/src/main/java/com/cloudera/impala/catalog/KuduTable.java
M fe/src/main/java/com/cloudera/impala/catalog/delegates/KuduDdlDelegate.java
M fe/src/main/java/com/cloudera/impala/planner/KuduScanNode.java
M fe/src/main/java/com/cloudera/impala/util/KuduUtil.java
D fe/src/test/java/com/cloudera/impala/planner/KuduPlannerTest.java
M fe/src/test/java/com/cloudera/impala/planner/PlannerTest.java
M fe/src/test/java/com/cloudera/impala/planner/PlannerTestBase.java
M fe/src/test/java/com/cloudera/impala/util/KuduUtilTest.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/kudu-selectivity.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
25 files changed, 526 insertions(+), 1,189 deletions(-)

Approvals:
  Matthew Jacobs: Looks good to me, approved
  Internal Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/4120
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I160e5849d372755748ff5ba3c90a4651c804b220
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Todd Lipcon 


[Impala-ASF-CR] IMPALA-3481: Use Kudu ScanToken API for scan ranges

2016-09-07 Thread Internal Jenkins (Code Review)
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-3481: Use Kudu ScanToken API for scan ranges
..


Patch Set 6: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4120
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I160e5849d372755748ff5ba3c90a4651c804b220
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3481: Use Kudu ScanToken API for scan ranges

2016-09-07 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3481: Use Kudu ScanToken API for scan ranges
..


Patch Set 6: Code-Review+2

rebase, carrying marcel's +2

-- 
To view, visit http://gerrit.cloudera.org:8080/4120
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I160e5849d372755748ff5ba3c90a4651c804b220
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3481: Use Kudu ScanToken API for scan ranges

2016-09-07 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3481: Use Kudu ScanToken API for scan ranges
..


Patch Set 5: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/4120
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I160e5849d372755748ff5ba3c90a4651c804b220
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3481: Use Kudu ScanToken API for scan ranges

2016-09-07 Thread Matthew Jacobs (Code Review)
Hello Dan Burkert,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4120

to look at the new patch set (#5).

Change subject: IMPALA-3481: Use Kudu ScanToken API for scan ranges
..

IMPALA-3481: Use Kudu ScanToken API for scan ranges

Switches the planner and KuduScanNode to use Kudu's new
ScanToken API instead of explicitly constructing scan ranges
for all tablets of a table, regardless of whether they were
needed. The ScanToken API allows Impala to specify the
projected columns and predicates during planning, and Kudu
returns a set of 'scan tokens' that represent a scanner for
each tablet that needs to be scanned. The scan tokens can
be serialized and distributed to the scan nodes, which can
then deserialize them into Kudu scanner objects. Upon
deserialization, the scan token has all scan parameters
already, including the 'pushed down' predicates. Impala no
longer needs to send the Kudu predicates to the BE and
convert them at the scan node.

This change also fixes:
1) IMPALA-4016: Avoid materializing slots only referenced
by Kudu conjuncts
2) IMPALA-3874: Predicates are not always pushed to Kudu

TODO: Consider additional planning improvements.

Testing: Updated the existing tests, verified everything
works as expected. Some BE tests no longer make sense and
they were removed.

TODO: When KUDU-1065 is resolved, add tests that demonstrate pruning.

Change-Id: I160e5849d372755748ff5ba3c90a4651c804b220
---
M be/src/exec/kudu-scan-node-test.cc
M be/src/exec/kudu-scan-node.cc
M be/src/exec/kudu-scan-node.h
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M be/src/exec/kudu-util.cc
M be/src/exec/kudu-util.h
M be/src/scheduling/simple-scheduler.cc
M bin/impala-config.sh
M common/thrift/PlanNodes.thrift
M fe/pom.xml
M fe/src/main/java/com/cloudera/impala/analysis/Expr.java
M fe/src/main/java/com/cloudera/impala/analysis/LiteralExpr.java
M fe/src/main/java/com/cloudera/impala/catalog/KuduTable.java
M fe/src/main/java/com/cloudera/impala/catalog/delegates/KuduDdlDelegate.java
M fe/src/main/java/com/cloudera/impala/planner/KuduScanNode.java
M fe/src/main/java/com/cloudera/impala/util/KuduUtil.java
D fe/src/test/java/com/cloudera/impala/planner/KuduPlannerTest.java
M fe/src/test/java/com/cloudera/impala/planner/PlannerTest.java
M fe/src/test/java/com/cloudera/impala/planner/PlannerTestBase.java
M fe/src/test/java/com/cloudera/impala/util/KuduUtilTest.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/kudu-selectivity.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
25 files changed, 526 insertions(+), 1,189 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/4120/5
-- 
To view, visit http://gerrit.cloudera.org:8080/4120
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I160e5849d372755748ff5ba3c90a4651c804b220
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Todd Lipcon 


[Impala-ASF-CR] IMPALA-3481: Use Kudu ScanToken API for scan ranges

2016-09-07 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3481: Use Kudu ScanToken API for scan ranges
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4120/4/be/src/exec/kudu-scan-node-test.cc
File be/src/exec/kudu-scan-node-test.cc:

Line 53: class KuduScanNodeTest : public testing::Test {
> good idea, let's leave a todo.
Done. Makes sense.


http://gerrit.cloudera.org:8080/#/c/4120/4/fe/src/test/java/com/cloudera/impala/planner/KuduPlannerTest.java
File fe/src/test/java/com/cloudera/impala/planner/KuduPlannerTest.java:

Line 33: public class KuduPlannerTest extends PlannerTestBase {
> is this very slow?
Shouldn't be. I'll move them into PlannerTest.


-- 
To view, visit http://gerrit.cloudera.org:8080/4120
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I160e5849d372755748ff5ba3c90a4651c804b220
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3481: Use Kudu ScanToken API for scan ranges

2016-09-07 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3481: Use Kudu ScanToken API for scan ranges
..


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/4120/4/be/src/exec/kudu-scan-node-test.cc
File be/src/exec/kudu-scan-node-test.cc:

Line 53: class KuduScanNodeTest : public testing::Test {
> I can't say since I didn't add these to begin with, but while we're in the 
good idea, let's leave a todo.

the reason i don't like white-box testing is that you end up writing a lot of 
test harness code, which can have bugs, and you might end up  testing paths 
that aren't taken during actual query execution.


http://gerrit.cloudera.org:8080/#/c/4120/4/be/src/exec/kudu-scanner.cc
File be/src/exec/kudu-scanner.cc:

Line 114:   if (batch_done) break;
> To avoid returning Status::OK() again, and in case logic gets added after t
no, please leave as is.


http://gerrit.cloudera.org:8080/#/c/4120/4/fe/src/test/java/com/cloudera/impala/planner/KuduPlannerTest.java
File fe/src/test/java/com/cloudera/impala/planner/KuduPlannerTest.java:

Line 33: public class KuduPlannerTest extends PlannerTestBase {
> I don't know what the original reason was, but now that it's separate it's 
is this very slow?

the planner tests typically take something like 40s to run, a few seconds on 
top won't make a difference. also, it's good to have everything in one place, 
so i don't need to remember to run two planner tests when making planner 
changes.


-- 
To view, visit http://gerrit.cloudera.org:8080/4120
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I160e5849d372755748ff5ba3c90a4651c804b220
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3481: Use Kudu ScanToken API for scan ranges

2016-09-07 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3481: Use Kudu ScanToken API for scan ranges
..


Patch Set 4:

(17 comments)

http://gerrit.cloudera.org:8080/#/c/4120/4/be/src/exec/kudu-scan-node-test.cc
File be/src/exec/kudu-scan-node-test.cc:

Line 53: class KuduScanNodeTest : public testing::Test {
> we don't have unit tests for any other exec node. why is this an exception?
I can't say since I didn't add these to begin with, but while we're in the 
process of getting our end-to-end tests up it doesn't hurt to have some of this 
coverage. I think once our functional tests are in place this can be removed.


http://gerrit.cloudera.org:8080/#/c/4120/4/be/src/exec/kudu-scan-node.cc
File be/src/exec/kudu-scan-node.cc:

Line 247: VLOG(1) << "Thread started: " << name;
> 1 is too chatty. also, please use the macros.
Done


Line 328:   VLOG(1) << "Thread done: " << name;
> change
Done


http://gerrit.cloudera.org:8080/#/c/4120/4/be/src/exec/kudu-scan-node.h
File be/src/exec/kudu-scan-node.h:

Line 39: /// are used to retreive the rows for this scan.
> retrieve
Done


Line 120:   void ScannerThread(const string& name, const string* initial_token);
> rename to some verb
Done


Line 125:   Status ProcessScanToken(KuduScanner* scanner, const string* 
scan_token);
> why const*? can this be null?
no, thanks, this should be const&


http://gerrit.cloudera.org:8080/#/c/4120/4/be/src/exec/kudu-scanner.cc
File be/src/exec/kudu-scanner.cc:

Line 114:   if (batch_done) break;
> why break instead of return?
To avoid returning Status::OK() again, and in case logic gets added after the 
while() loop. If you think it's simpler returning OK() here I'm happy to change 
it.


http://gerrit.cloudera.org:8080/#/c/4120/4/be/src/exec/kudu-scanner.h
File be/src/exec/kudu-scanner.h:

Line 34: /// Wraps a Kudu client scanner to fetches row batches from Kudu. The 
Kudu client scanner
> to fetch
Done


Line 114:   int cur_kudu_batch_num_scanned_;
> explain
Done


http://gerrit.cloudera.org:8080/#/c/4120/4/fe/src/main/java/com/cloudera/impala/planner/KuduScanNode.java
File fe/src/main/java/com/cloudera/impala/planner/KuduScanNode.java:

Line 107: try (KuduClient client = new KuduClientBuilder(
> break before 'new' instead
Done


Line 165:   // TODO: Consider only using the LEADER replica.
> for better cache locality?
I was thinking it might be better but I don't know if there's any reason that 
might be the case. I'll take out this comment unless and we can chat about the 
scheduling policies later.


Line 167: TNetworkAddress address = new 
TNetworkAddress(replica.getRpcHost(),
> break before new instead
Done


Line 203: for (KuduPredicate predicate: kuduPredicates_) {
> single line
Done


Line 267:* bounds of the result can be evaluated with Expr::GetValue(NULL)).
> this presumable has some side effects. describe them.
Done


Line 349:   private static ComparisonOp 
getKuduOperator(BinaryPredicate.Operator op) {
> use qualified name for ComparisonOp, easier to identify that it's a "foreig
Done


Line 351: case GT: return ComparisonOp.GREATER;
> indentation
Done


http://gerrit.cloudera.org:8080/#/c/4120/4/fe/src/test/java/com/cloudera/impala/planner/KuduPlannerTest.java
File fe/src/test/java/com/cloudera/impala/planner/KuduPlannerTest.java:

Line 33: public class KuduPlannerTest extends PlannerTestBase {
> could we merge this back into the regular planner test? what was the reason
I don't know what the original reason was, but now that it's separate it's kind 
of handy for creating/destroying the KuduClient only for these tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/4120
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I160e5849d372755748ff5ba3c90a4651c804b220
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3481: Use Kudu ScanToken API for scan ranges

2016-09-07 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3481: Use Kudu ScanToken API for scan ranges
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4120/4/fe/src/main/java/com/cloudera/impala/planner/KuduScanNode.java
File fe/src/main/java/com/cloudera/impala/planner/KuduScanNode.java:

Line 131:* TODO: Remove when the columns are fetched from Kudu directly 
during analysis.
> you're not going to avoid a round trip to the kudu master to get tablet loc
those can all be addressed separately, but i don't see a reason to ditch 
functionality we already have. it's always harder to put it back later. and it 
would be a special case for kudu anyway.


-- 
To view, visit http://gerrit.cloudera.org:8080/4120
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I160e5849d372755748ff5ba3c90a4651c804b220
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3481: Use Kudu ScanToken API for scan ranges

2016-09-07 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: IMPALA-3481: Use Kudu ScanToken API for scan ranges
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4120/4/fe/src/main/java/com/cloudera/impala/planner/KuduScanNode.java
File fe/src/main/java/com/cloudera/impala/planner/KuduScanNode.java:

Line 139: tableSchema.getColumn(colName);
> or why not simply have getColumn() return null?
that would have been a reasonable decision too, but now that we've released it, 
changing it is an incompatible API change. In general we try to make our Java 
API JDBC-like since most enterprise Java programmers are reasonably familiar 
with JDBC.


-- 
To view, visit http://gerrit.cloudera.org:8080/4120
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I160e5849d372755748ff5ba3c90a4651c804b220
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3481: Use Kudu ScanToken API for scan ranges

2016-09-07 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: IMPALA-3481: Use Kudu ScanToken API for scan ranges
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4120/4/fe/src/main/java/com/cloudera/impala/planner/KuduScanNode.java
File fe/src/main/java/com/cloudera/impala/planner/KuduScanNode.java:

Line 131:* TODO: Remove when the columns are fetched from Kudu directly 
during analysis.
> one of our design principles is to avoid round-trips to central servers dur
you're not going to avoid a round trip to the kudu master to get tablet 
locations anyway... and if there's some adversarial workload that's bringing 
the master to its knees, then your backends will just hit the same issue when 
they try to open scanners (they also need to talk to the master before talking 
to a tablet)


-- 
To view, visit http://gerrit.cloudera.org:8080/4120
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I160e5849d372755748ff5ba3c90a4651c804b220
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3481: Use Kudu ScanToken API for scan ranges

2016-09-07 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3481: Use Kudu ScanToken API for scan ranges
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4120/4/fe/src/main/java/com/cloudera/impala/planner/KuduScanNode.java
File fe/src/main/java/com/cloudera/impala/planner/KuduScanNode.java:

Line 131:* TODO: Remove when the columns are fetched from Kudu directly 
during analysis.
> when you "open" a table, it fetches the metadata (columns, etc) from the ma
one of our design principles is to avoid round-trips to central servers during 
plan generation. (i believe your test cluster numbers, but i also believe it's 
not that hard to create an adversarial workload.)

let's remove this todo.


Line 139: tableSchema.getColumn(colName);
> We're more or less following the pattern of JDBC where trying to get an inv
or why not simply have getColumn() return null?

we have no special allegiance to jdbc.


-- 
To view, visit http://gerrit.cloudera.org:8080/4120
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I160e5849d372755748ff5ba3c90a4651c804b220
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3481: Use Kudu ScanToken API for scan ranges

2016-09-07 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: IMPALA-3481: Use Kudu ScanToken API for scan ranges
..


Patch Set 4:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/4120/4/be/src/scheduling/simple-scheduler.cc
File be/src/scheduling/simple-scheduler.cc:

Line 976: scan_range_length = 1000;
> can't the fe get that out of the scan token?
We still don't have the tablet sizes available to the client yet, it's a TODO 
item (see jira referenced above)


http://gerrit.cloudera.org:8080/#/c/4120/4/fe/src/main/java/com/cloudera/impala/planner/KuduScanNode.java
File fe/src/main/java/com/cloudera/impala/planner/KuduScanNode.java:

Line 131:* TODO: Remove when the columns are fetched from Kudu directly 
during analysis.
> does kudu cache all metadata? where would the column info get fetched from?
when you "open" a table, it fetches the metadata (columns, etc) from the 
master. This isn't a fan-out so it should have low tail latency. On one of our 
test clusters, 99%ile is 492us, mean 84us.


Line 139: tableSchema.getColumn(colName);
> note to kudu api designers: it's nice not to use exceptions to signal retur
We're more or less following the pattern of JDBC where trying to get an invalid 
column throws an exception rather than returning null (see 
https://github.com/pgjdbc/pgjdbc/blob/master/pgjdbc/src/main/java/org/postgresql/jdbc/PgResultSetMetaData.java#L487
 )

That said, maybe catching IllegalArgumentException would be a better choice?

We could also add a new 'bool hasColumn(String name)' if you'd like.


Line 295: // Cannot push prediates with null literal values
> is there a kudu jira for this?
there is now: KUDU-1595


-- 
To view, visit http://gerrit.cloudera.org:8080/4120
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I160e5849d372755748ff5ba3c90a4651c804b220
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3481: Use Kudu ScanToken API for scan ranges

2016-09-07 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3481: Use Kudu ScanToken API for scan ranges
..


Patch Set 4:

(21 comments)

http://gerrit.cloudera.org:8080/#/c/4120/4/be/src/exec/kudu-scan-node-test.cc
File be/src/exec/kudu-scan-node-test.cc:

Line 53: class KuduScanNodeTest : public testing::Test {
we don't have unit tests for any other exec node. why is this an exception?


http://gerrit.cloudera.org:8080/#/c/4120/4/be/src/exec/kudu-scan-node.cc
File be/src/exec/kudu-scan-node.cc:

Line 247: VLOG(1) << "Thread started: " << name;
1 is too chatty. also, please use the macros.


Line 328:   VLOG(1) << "Thread done: " << name;
change


http://gerrit.cloudera.org:8080/#/c/4120/4/be/src/exec/kudu-scan-node.h
File be/src/exec/kudu-scan-node.h:

Line 39: /// are used to retreive the rows for this scan.
retrieve


Line 120:   void ScannerThread(const string& name, const string* initial_token);
rename to some verb


Line 125:   Status ProcessScanToken(KuduScanner* scanner, const string* 
scan_token);
why const*? can this be null?


http://gerrit.cloudera.org:8080/#/c/4120/4/be/src/exec/kudu-scanner.cc
File be/src/exec/kudu-scanner.cc:

Line 114:   if (batch_done) break;
why break instead of return?


http://gerrit.cloudera.org:8080/#/c/4120/4/be/src/exec/kudu-scanner.h
File be/src/exec/kudu-scanner.h:

Line 34: /// Wraps a Kudu client scanner to fetches row batches from Kudu. The 
Kudu client scanner
to fetch


Line 114:   int cur_kudu_batch_num_scanned_;
explain


http://gerrit.cloudera.org:8080/#/c/4120/4/be/src/scheduling/simple-scheduler.cc
File be/src/scheduling/simple-scheduler.cc:

Line 976: scan_range_length = 1000;
can't the fe get that out of the scan token?


http://gerrit.cloudera.org:8080/#/c/4120/4/fe/src/main/java/com/cloudera/impala/planner/KuduScanNode.java
File fe/src/main/java/com/cloudera/impala/planner/KuduScanNode.java:

Line 107: try (KuduClient client = new KuduClientBuilder(
break before 'new' instead


Line 131:* TODO: Remove when the columns are fetched from Kudu directly 
during analysis.
does kudu cache all metadata? where would the column info get fetched from?


Line 139: tableSchema.getColumn(colName);
note to kudu api designers: it's nice not to use exceptions to signal return 
values.

is that a sufficient test? what about data types?


Line 165:   // TODO: Consider only using the LEADER replica.
for better cache locality?


Line 167: TNetworkAddress address = new 
TNetworkAddress(replica.getRpcHost(),
break before new instead


Line 203: for (KuduPredicate predicate: kuduPredicates_) {
single line


Line 267:* bounds of the result can be evaluated with Expr::GetValue(NULL)).
this presumable has some side effects. describe them.


Line 295: // Cannot push prediates with null literal values
is there a kudu jira for this?


Line 349:   private static ComparisonOp 
getKuduOperator(BinaryPredicate.Operator op) {
use qualified name for ComparisonOp, easier to identify that it's a "foreign" 
type that way


Line 351: case GT: return ComparisonOp.GREATER;
indentation


http://gerrit.cloudera.org:8080/#/c/4120/4/fe/src/test/java/com/cloudera/impala/planner/KuduPlannerTest.java
File fe/src/test/java/com/cloudera/impala/planner/KuduPlannerTest.java:

Line 33: public class KuduPlannerTest extends PlannerTestBase {
could we merge this back into the regular planner test? what was the reason for 
separating this?


-- 
To view, visit http://gerrit.cloudera.org:8080/4120
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I160e5849d372755748ff5ba3c90a4651c804b220
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3481: Use Kudu ScanToken API for scan ranges

2016-09-01 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change.

Change subject: IMPALA-3481: Use Kudu ScanToken API for scan ranges
..


Patch Set 4: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4120
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I160e5849d372755748ff5ba3c90a4651c804b220
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3481: Use Kudu ScanToken API for scan ranges

2016-09-01 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has uploaded a new patch set (#4).

Change subject: IMPALA-3481: Use Kudu ScanToken API for scan ranges
..

IMPALA-3481: Use Kudu ScanToken API for scan ranges

Switches the planner and KuduScanNode to use Kudu's new
ScanToken API instead of explicitly constructing scan ranges
for all tablets of a table, regardless of whether they were
needed. The ScanToken API allows Impala to specify the
projected columns and predicates during planning, and Kudu
returns a set of 'scan tokens' that represent a scanner for
each tablet that needs to be scanned. The scan tokens can
be serialized and distributed to the scan nodes, which can
then deserialize them into Kudu scanner objects. Upon
deserialization, the scan token has all scan parameters
already, including the 'pushed down' predicates. Impala no
longer needs to send the Kudu predicates to the BE and
convert them at the scan node.

This change also fixes:
1) IMPALA-4016: Avoid materializing slots only referenced
by Kudu conjuncts
2) IMPALA-3874: Predicates are not always pushed to Kudu

TODO: Consider additional planning improvements.

Testing: Updated the existing tests, verified everything
works as expected. Some BE tests no longer make sense and
they were removed.

TODO: When KUDU-1065 is resolved, add tests that demonstrate pruning.

Change-Id: I160e5849d372755748ff5ba3c90a4651c804b220
---
M be/src/exec/kudu-scan-node-test.cc
M be/src/exec/kudu-scan-node.cc
M be/src/exec/kudu-scan-node.h
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M be/src/exec/kudu-util.cc
M be/src/exec/kudu-util.h
M be/src/scheduling/simple-scheduler.cc
M bin/impala-config.sh
M common/thrift/PlanNodes.thrift
M fe/pom.xml
M fe/src/main/java/com/cloudera/impala/analysis/Expr.java
M fe/src/main/java/com/cloudera/impala/analysis/LiteralExpr.java
M fe/src/main/java/com/cloudera/impala/catalog/KuduTable.java
M fe/src/main/java/com/cloudera/impala/catalog/delegates/KuduDdlDelegate.java
M fe/src/main/java/com/cloudera/impala/planner/KuduScanNode.java
M fe/src/main/java/com/cloudera/impala/util/KuduUtil.java
M fe/src/test/java/com/cloudera/impala/planner/KuduPlannerTest.java
M fe/src/test/java/com/cloudera/impala/planner/PlannerTestBase.java
M fe/src/test/java/com/cloudera/impala/util/KuduUtilTest.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/kudu-selectivity.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
24 files changed, 521 insertions(+), 1,137 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/4120/4
-- 
To view, visit http://gerrit.cloudera.org:8080/4120
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I160e5849d372755748ff5ba3c90a4651c804b220
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Todd Lipcon 


[Impala-ASF-CR] IMPALA-3481: Use Kudu ScanToken API for scan ranges

2016-08-30 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has uploaded a new patch set (#3).

Change subject: IMPALA-3481: Use Kudu ScanToken API for scan ranges
..

IMPALA-3481: Use Kudu ScanToken API for scan ranges

Switches the planner and KuduScanNode to use Kudu's new
ScanToken API instead of explicitly constructing scan ranges
for all tablets of a table, regardless of whether they were
needed. The ScanToken API allows Impala to specify the
projected columns and predicates during planning, and Kudu
returns a set of 'scan tokens' that represent a scanner for
each tablet that needs to be scanned. The scan tokens can
be serialized and distributed to the scan nodes, which can
then deserialize them into Kudu scanner objects. Upon
deserialization, the scan token has all scan parameters
already, including the 'pushed down' predicates. Impala no
longer needs to send the Kudu predicates to the BE and
convert them at the scan node.

This change also fixes:
1) IMPALA-4016: Avoid materializing slots only referenced
by Kudu conjuncts
2) IMPALA-3874: Predicates are not always pushed to Kudu

TODO: Consider additional planning improvements.

Testing: Updated the existing tests, verified everything
works as expected. Some BE tests no longer make sense and
they were removed.

TODO: When KUDU-1065 is resolved, add tests that demonstrate pruning.

Change-Id: I160e5849d372755748ff5ba3c90a4651c804b220
---
M be/src/exec/kudu-scan-node-test.cc
M be/src/exec/kudu-scan-node.cc
M be/src/exec/kudu-scan-node.h
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M be/src/exec/kudu-util.cc
M be/src/exec/kudu-util.h
M be/src/scheduling/simple-scheduler.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/com/cloudera/impala/analysis/Expr.java
M fe/src/main/java/com/cloudera/impala/analysis/LiteralExpr.java
M fe/src/main/java/com/cloudera/impala/planner/KuduScanNode.java
M fe/src/test/java/com/cloudera/impala/planner/PlannerTestBase.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/kudu-selectivity.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
17 files changed, 457 insertions(+), 1,088 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/4120/3
-- 
To view, visit http://gerrit.cloudera.org:8080/4120
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I160e5849d372755748ff5ba3c90a4651c804b220
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Todd Lipcon 


[Impala-ASF-CR] IMPALA-3481: Use Kudu ScanToken API for scan ranges

2016-08-29 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3481: Use Kudu ScanToken API for scan ranges
..


Patch Set 2:

BTW, I think this actually has reasonable test coverage with the existing tests 
to get started. I think it makes sense to follow-up with additional tests for 
predicate pushdown - once more of the functional schema is loaded (alltypes*). 
I'll add new tests to demonstrate pruning once KUDU-1065 is resolved.

-- 
To view, visit http://gerrit.cloudera.org:8080/4120
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I160e5849d372755748ff5ba3c90a4651c804b220
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3481: Use Kudu ScanToken API for scan ranges

2016-08-29 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has uploaded a new patch set (#2).

Change subject: IMPALA-3481: Use Kudu ScanToken API for scan ranges
..

IMPALA-3481: Use Kudu ScanToken API for scan ranges

Switches the planner and KuduScanNode to use Kudu's new
ScanToken API instead of explicitly constructing scan ranges
for all tablets of a table, regardless of whether they were
needed. The ScanToken API allows Impala to specify the
projected columns and predicates during planning, and Kudu
returns a set of 'scan tokens' that represent a scanner for
each tablet that needs to be scanned. The scan tokens can
be serialized and distributed to the scan nodes, which can
then deserialize them into Kudu scanner objects. Upon
deserialization, the scan token has all scan parameters
already, including the 'pushed down' predicates. Impala no
longer needs to send the Kudu predicates to the BE and
convert them at the scan node.

This change also fixes:
1) IMPALA-4016: Avoid materializing slots only referenced
by Kudu conjuncts
2) IMPALA-3874: Predicates are not always pushed to Kudu

TODO: Consider additional planning improvements.

Testing: Updated the existing tests, verified everything
works as expected. Some BE tests no longer make sense and
they were removed.

TODO: When KUDU-1065 is resolved, add tests that demonstrate pruning.

Change-Id: I160e5849d372755748ff5ba3c90a4651c804b220
---
M be/src/exec/kudu-scan-node-test.cc
M be/src/exec/kudu-scan-node.cc
M be/src/exec/kudu-scan-node.h
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M be/src/exec/kudu-util.cc
M be/src/exec/kudu-util.h
M be/src/scheduling/simple-scheduler.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/com/cloudera/impala/analysis/Expr.java
M fe/src/main/java/com/cloudera/impala/analysis/LiteralExpr.java
M fe/src/main/java/com/cloudera/impala/planner/KuduScanNode.java
M fe/src/test/java/com/cloudera/impala/planner/PlannerTestBase.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/kudu-selectivity.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
17 files changed, 434 insertions(+), 1,080 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/4120/2
-- 
To view, visit http://gerrit.cloudera.org:8080/4120
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I160e5849d372755748ff5ba3c90a4651c804b220
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Todd Lipcon 


[Impala-ASF-CR] IMPALA-3481: Use Kudu ScanToken API for scan ranges

2016-08-29 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3481: Use Kudu ScanToken API for scan ranges
..


Patch Set 1:

(13 comments)

Thanks for the feedback. I made changes/follow-up comments for everything 
except for how we do the plan output printing, since Dan is going to provide a 
printing fn for us.

http://gerrit.cloudera.org:8080/#/c/4120/1/be/src/exec/kudu-scan-node.cc
File be/src/exec/kudu-scan-node.cc:

PS1, Line 225: range
> GetNextScanToken returns a ... "range" :). I think we should clean this a b
Done


http://gerrit.cloudera.org:8080/#/c/4120/1/be/src/exec/kudu-scan-node.h
File be/src/exec/kudu-scan-node.h:

PS1, Line 35: tablet
> Hm, maybe I am confused but a server can have multiple tablets that need to
Done


PS1, Line 76: string
> not sure if y'all follow the same style guide, but this should be std::stri
Done


Line 79:   int next_scan_token_idx_;
> I may have missed something, but it looks like you could just use scan_toke
I started changing this, but I think it's nice to just be passing string* 
around as the tokens. I could still use this as a stack if I store them 
somewhere else (e.g. a mempool), but not sure if it's worth it.


PS1, Line 125: ranges
> tokens and ranges are sort of used interchangeably. Maybe we should keep on
Done


http://gerrit.cloudera.org:8080/#/c/4120/1/be/src/exec/kudu-scanner.h
File be/src/exec/kudu-scanner.h:

PS1, Line 43: 'range'
> update comment?
Done


http://gerrit.cloudera.org:8080/#/c/4120/1/fe/src/main/java/com/cloudera/impala/planner/KuduScanNode.java
File fe/src/main/java/com/cloudera/impala/planner/KuduScanNode.java:

PS1, Line 64: (<=, >=, ==)
> Comment needs to be updated.
Done


PS1, Line 77: // TODO use the portion of this list that pertains to keys to do 
partition pruning.
> I think we can remove this TODO now, no?
Done


Line 95: kuduTable_.getKuduMasterAddresses()).build()) {
> nit: indentation is a bit weird, maybe it's Impala style though
Done


PS1, Line 133: e master 
> I think you mean LEADER replica
Done


PS1, Line 233: <=, >=, ==
> Update comment
Done


PS1, Line 269: column
> has it already been validated that the slot refs only refer to columns that
Yeah, random exceptions might not come back nicely. I added a validate method 
which happens before this and throws a more specific error.


PS1, Line 264: if (!(predicate.getChild(0) instanceof SlotRef)) return null;
 : if (!predicate.getChild(1).isLiteral()) return null;
 : 
 : SlotRef ref = (SlotRef)predicate.getChild(0);
 : String colName = ref.getDesc().getColumn().getName();
 : ColumnSchema column = table.getSchema().getColumn(colName);
 : ComparisonOp op = 
getKuduOperator(((BinaryPredicate)predicate).getOp());
 : if (op == null) return null;
> How about putting all the logic of checking if a predicate can be pushed to
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4120
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I160e5849d372755748ff5ba3c90a4651c804b220
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3481: Use Kudu ScanToken API for scan ranges

2016-08-26 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3481: Use Kudu ScanToken API for scan ranges
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4120/1/fe/src/test/java/com/cloudera/impala/planner/PlannerTestBase.java
File fe/src/test/java/com/cloudera/impala/planner/PlannerTestBase.java:

Line 297:   result.append(Arrays.toString(upperBound.toByteArray()));
> You can build against a SNAPSHOT java client, right? In that case we can ge
Yes we can, and that would be great. Lemme know if you want me to look at a CR.


-- 
To view, visit http://gerrit.cloudera.org:8080/4120
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I160e5849d372755748ff5ba3c90a4651c804b220
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3481: Use Kudu ScanToken API for scan ranges

2016-08-25 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: IMPALA-3481: Use Kudu ScanToken API for scan ranges
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4120/1/fe/src/test/java/com/cloudera/impala/planner/PlannerTestBase.java
File fe/src/test/java/com/cloudera/impala/planner/PlannerTestBase.java:

Line 297:   result.append(Arrays.toString(upperBound.toByteArray()));
> Yes, a toString() method would be very handy. Shall I file a JIRA? In the m
You can build against a SNAPSHOT java client, right? In that case we can get 
you an improved API with really fast turnaround. Would rather do that (eg 
tomorrow) than check in usage of private/shaded APIs.


-- 
To view, visit http://gerrit.cloudera.org:8080/4120
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I160e5849d372755748ff5ba3c90a4651c804b220
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3481: Use Kudu ScanToken API for scan ranges

2016-08-25 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3481: Use Kudu ScanToken API for scan ranges
..


Patch Set 1:

(3 comments)

Thanks for the feedback. Just responding to the plan info issue now. Will fix 
everything else later.

http://gerrit.cloudera.org:8080/#/c/4120/1//COMMIT_MSG
Commit Message:

Line 36:   work), add TPC-DS tests that demonstrate pruning.
> I think the pruning isn't quite working on the Kudu side quite yet. You can
Ah ok, thanks.


http://gerrit.cloudera.org:8080/#/c/4120/1/fe/src/test/java/com/cloudera/impala/planner/PlannerTestBase.java
File fe/src/test/java/com/cloudera/impala/planner/PlannerTestBase.java:

PS1, Line 281: KUDU KEYRANGE 
> Moreover, I think the important thing here isn't the partition key range, b
@Dimitris, yup.

@Todd/Dan, yeah, we'd ideally like to get a sense of what data is going to be 
scanned. I can change the name for now.


Line 297:   result.append(Arrays.toString(upperBound.toByteArray()));
> maybe we should provide you a nicer toString() type method you can use? In 
Yes, a toString() method would be very handy. Shall I file a JIRA? In the 
meantime I guess we can keep this though, assuming we switch over to using a 
new toString() method or such before we ship- seem reasonable? We just need 
some way to verify the plans.


-- 
To view, visit http://gerrit.cloudera.org:8080/4120
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I160e5849d372755748ff5ba3c90a4651c804b220
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3481: Use Kudu ScanToken API for scan ranges

2016-08-25 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change.

Change subject: IMPALA-3481: Use Kudu ScanToken API for scan ranges
..


Patch Set 1:

(3 comments)

ScanToken usage LGTM.  I agree with Todd that we should provide some more debug 
printing tools if you all need it.

http://gerrit.cloudera.org:8080/#/c/4120/1/be/src/exec/kudu-scan-node.h
File be/src/exec/kudu-scan-node.h:

PS1, Line 76: string
not sure if y'all follow the same style guide, but this should be std::string 
here and below.


Line 79:   int next_scan_token_idx_;
I may have missed something, but it looks like you could just use scan_tokens_ 
as a stack and pop tokens off it for each call to GetNextScanToken.


http://gerrit.cloudera.org:8080/#/c/4120/1/fe/src/test/java/com/cloudera/impala/planner/PlannerTestBase.java
File fe/src/test/java/com/cloudera/impala/planner/PlannerTestBase.java:

PS1, Line 281: KUDU KEYRANGE 
> KEYRANGE isn't really accurate -- this makes it seem like it's a primary ke
Moreover, I think the important thing here isn't the partition key range, but 
what tablet it's scanning.  I recommend just printing the tablet ID.


-- 
To view, visit http://gerrit.cloudera.org:8080/4120
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I160e5849d372755748ff5ba3c90a4651c804b220
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3481: Use Kudu ScanToken API for scan ranges

2016-08-25 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: IMPALA-3481: Use Kudu ScanToken API for scan ranges
..


Patch Set 1:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/4120/1//COMMIT_MSG
Commit Message:

Line 36:   work), add TPC-DS tests that demonstrate pruning.
I think the pruning isn't quite working on the Kudu side quite yet. You can 
follow https://issues.apache.org/jira/browse/KUDU-1065


http://gerrit.cloudera.org:8080/#/c/4120/1/fe/src/main/java/com/cloudera/impala/planner/KuduScanNode.java
File fe/src/main/java/com/cloudera/impala/planner/KuduScanNode.java:

Line 95: kuduTable_.getKuduMasterAddresses()).build()) {
nit: indentation is a bit weird, maybe it's Impala style though


PS1, Line 133: e master 
I think you mean LEADER replica


PS1, Line 269: column
has it already been validated that the slot refs only refer to columns that 
exist? eg if the metastore has some outdated information about the columns, 
this will throw an IllegalArgumentException. Would that propagate up to the 
user in a reasonable way?


http://gerrit.cloudera.org:8080/#/c/4120/1/fe/src/test/java/com/cloudera/impala/planner/PlannerTestBase.java
File fe/src/test/java/com/cloudera/impala/planner/PlannerTestBase.java:

PS1, Line 39: import org.kududb.client.Client.ScanTokenPB;
: import org.kududb.client.shaded.com.google.protobuf.ByteString;
: import 
org.kududb.client.shaded.com.google.protobuf.InvalidProtocolBufferException;
yea, this is pretty evil, there's no compatibility guarantee about these APIs 
so would strongly discourage using them


PS1, Line 281: KUDU KEYRANGE 
> Is this part used in any of the planner tests? I don't remember seeing it a
KEYRANGE isn't really accurate -- this makes it seem like it's a primary key 
range, where in fact it's a partition key range (which may not be one and the 
same)


Line 297:   result.append(Arrays.toString(upperBound.toByteArray()));
maybe we should provide you a nicer toString() type method you can use? In 
addition to being an evil use of our internal APIs, this isn't particularly 
useful for users to look at. You're just going to see something like 
[0,0,0,13,5,2,4,3,...] which is hardly friendly.


-- 
To view, visit http://gerrit.cloudera.org:8080/4120
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I160e5849d372755748ff5ba3c90a4651c804b220
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3481: Use Kudu ScanToken API for scan ranges

2016-08-25 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-3481: Use Kudu ScanToken API for scan ranges
..


Patch Set 1:

(9 comments)

Overall it looks good. At a high level, 'ranges' and 'tokens' seem to be used 
interchangeably which is a bit confusing. Also, I think we need more planner 
tests that exercise the logic of pushing predicates to Kudu.

http://gerrit.cloudera.org:8080/#/c/4120/1/be/src/exec/kudu-scan-node.cc
File be/src/exec/kudu-scan-node.cc:

PS1, Line 225: range
GetNextScanToken returns a ... "range" :). I think we should clean this a bit.


http://gerrit.cloudera.org:8080/#/c/4120/1/be/src/exec/kudu-scan-node.h
File be/src/exec/kudu-scan-node.h:

PS1, Line 35: tablet
Hm, maybe I am confused but a server can have multiple tablets that need to be 
scanned, correct? Don't we create one scanner per node?


PS1, Line 125: ranges
tokens and ranges are sort of used interchangeably. Maybe we should keep one?


http://gerrit.cloudera.org:8080/#/c/4120/1/be/src/exec/kudu-scanner.h
File be/src/exec/kudu-scanner.h:

PS1, Line 43: 'range'
update comment?


http://gerrit.cloudera.org:8080/#/c/4120/1/fe/src/main/java/com/cloudera/impala/planner/KuduScanNode.java
File fe/src/main/java/com/cloudera/impala/planner/KuduScanNode.java:

PS1, Line 64: (<=, >=, ==)
Comment needs to be updated.


PS1, Line 77: // TODO use the portion of this list that pertains to keys to do 
partition pruning.
I think we can remove this TODO now, no?


PS1, Line 233: <=, >=, ==
Update comment


PS1, Line 264: if (!(predicate.getChild(0) instanceof SlotRef)) return null;
 : if (!predicate.getChild(1).isLiteral()) return null;
 : 
 : SlotRef ref = (SlotRef)predicate.getChild(0);
 : String colName = ref.getDesc().getColumn().getName();
 : ColumnSchema column = table.getSchema().getColumn(colName);
 : ComparisonOp op = 
getKuduOperator(((BinaryPredicate)predicate).getOp());
 : if (op == null) return null;
How about putting all the logic of checking if a predicate can be pushed to 
Kudu into a single function? That would include this block and also some parts 
from extractKuduConjuncts() fn (e.g. L242-247).


http://gerrit.cloudera.org:8080/#/c/4120/1/fe/src/test/java/com/cloudera/impala/planner/PlannerTestBase.java
File fe/src/test/java/com/cloudera/impala/planner/PlannerTestBase.java:

PS1, Line 281: KUDU KEYRANGE 
Is this part used in any of the planner tests? I don't remember seeing it 
anywhere.


-- 
To view, visit http://gerrit.cloudera.org:8080/4120
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I160e5849d372755748ff5ba3c90a4651c804b220
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3481: Use Kudu ScanToken API for scan ranges

2016-08-24 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/4120

Change subject: IMPALA-3481: Use Kudu ScanToken API for scan ranges
..

IMPALA-3481: Use Kudu ScanToken API for scan ranges

Switches the planner and KuduScanNode to use Kudu's new
ScanToken API instead of explicitly constructing scan ranges
for all tablets of a table, regardless of whether they were
needed. The ScanToken API allows Impala to specify the
projected columns and predicates during planning, and Kudu
returns a set of 'scan tokens' that represent a scanner for
each tablet that needs to be scanned, pruning the tablets
that will not be necessary. The scan tokens can be
serialized and distributed to the scan nodes, which can then
deserialize them into Kudu scanner objects. Upon
deserialization, the scan token has all scan parameters
already, including the 'pushed down' predicates. Impala no
longer needs to send the Kudu predicates to the BE and
convert them at the scan node.

This change also fixes:
1) IMPALA-4016: Avoid materializing slots only referenced
by Kudu conjuncts
2) IMPALA-3874: Predicates are not always pushed to Kudu

TODO: Consider additional planning improvements.

Testing: Updated the existing tests, verified everything
works as expected. Some BE tests no longer make sense and
they were removed.

TODO: When the TPC-DS schema is loaded for Kudu (follow-up
  work), add TPC-DS tests that demonstrate pruning.

Change-Id: I160e5849d372755748ff5ba3c90a4651c804b220
---
M be/src/exec/kudu-scan-node-test.cc
M be/src/exec/kudu-scan-node.cc
M be/src/exec/kudu-scan-node.h
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M be/src/exec/kudu-util.cc
M be/src/exec/kudu-util.h
M be/src/scheduling/simple-scheduler.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/com/cloudera/impala/analysis/Expr.java
M fe/src/main/java/com/cloudera/impala/analysis/LiteralExpr.java
M fe/src/main/java/com/cloudera/impala/planner/KuduScanNode.java
M fe/src/test/java/com/cloudera/impala/planner/PlannerTestBase.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/kudu-selectivity.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
17 files changed, 337 insertions(+), 993 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/4120/1
-- 
To view, visit http://gerrit.cloudera.org:8080/4120
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I160e5849d372755748ff5ba3c90a4651c804b220
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs