[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables

2017-08-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5602: Fix query optimization for kudu and datasource 
tables
..


Patch Set 11: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7560
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables

2017-08-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5602: Fix query optimization for kudu and datasource 
tables
..


IMPALA-5602: Fix query optimization for kudu and datasource tables

Fix a bug where the following queries on kudu and datasource tables
were incorrectly being optimized as a 'small query' and therefore
running on a single node with a single scanner thread:

(A) that have all their predicates pushed to the underlying storage
layer and have a limit
(B) table stats missing + Conditions in (A)

Testing:
Added frontend planner tests.

Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4
Reviewed-on: http://gerrit.cloudera.org:8080/7560
Reviewed-by: Matthew Jacobs 
Tested-by: Impala Public Jenkins
---
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/data-source-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
10 files changed, 131 insertions(+), 40 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Matthew Jacobs: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/7560
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables

2017-08-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5602: Fix query optimization for kudu and datasource 
tables
..


Patch Set 11:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1138/

-- 
To view, visit http://gerrit.cloudera.org:8080/7560
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables

2017-08-23 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5602: Fix query optimization for kudu and datasource 
tables
..


Patch Set 11: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7560
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables

2017-08-23 Thread Bikramjeet Vig (Code Review)
Hello Matthew Jacobs, Alex Behm, Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7560

to look at the new patch set (#11).

Change subject: IMPALA-5602: Fix query optimization for kudu and datasource 
tables
..

IMPALA-5602: Fix query optimization for kudu and datasource tables

Fix a bug where the following queries on kudu and datasource tables
were incorrectly being optimized as a 'small query' and therefore
running on a single node with a single scanner thread:

(A) that have all their predicates pushed to the underlying storage
layer and have a limit
(B) table stats missing + Conditions in (A)

Testing:
Added frontend planner tests.

Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4
---
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/data-source-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
10 files changed, 131 insertions(+), 40 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/7560/11
-- 
To view, visit http://gerrit.cloudera.org:8080/7560
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables

2017-08-23 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has posted comments on this change.

Change subject: IMPALA-5602: Fix query optimization for kudu and datasource 
tables
..


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7560/10/fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
File fe/src/test/java/org/apache/impala/common/FrontendTestBase.java:

PS10, Line 170: table
  :* set, but no ot
> This isn't true, for HDFS tables we set the partition map (to the 'default 
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7560
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables

2017-08-23 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5602: Fix query optimization for kudu and datasource 
tables
..


Patch Set 10: Code-Review+2

(2 comments)

Please address MJ's last comment.

http://gerrit.cloudera.org:8080/#/c/7560/8/fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
File fe/src/test/java/org/apache/impala/common/FrontendTestBase.java:

PS8, Line 200: fail("Failed to add table, external kudu table 
expected:\n" + createTableSql);
 :   }
 :   t
> Right, makes sense. Yeah, so we're already tied to Kudu in planner tests. I
Current approach seems fine to me. A test only depends on Kudu if it uses 
addTestTable() with a Kudu table. Seems no different from existing tests.


http://gerrit.cloudera.org:8080/#/c/7560/10/fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
File fe/src/test/java/org/apache/impala/common/FrontendTestBase.java:

PS10, Line 170: but no
  :* other metadata
> This isn't true, for HDFS tables we set the partition map (to the 'default 
I like your suggestion.


-- 
To view, visit http://gerrit.cloudera.org:8080/7560
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables

2017-08-23 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5602: Fix query optimization for kudu and datasource 
tables
..


Patch Set 10:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7560/8/fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
File fe/src/test/java/org/apache/impala/common/FrontendTestBase.java:

PS8, Line 200: fail("Failed to add table, external kudu table 
expected:\n" + createTableSql);
 :   }
 :   t
> This wont work for managed kudu tables because this method requires we only
Right, makes sense. Yeah, so we're already tied to Kudu in planner tests. I 
guess this is the downside of not keeping this metadata in the catalog. I agree 
then it's not worth trying to address here. I'm fine with this approach if Alex 
is as well.


http://gerrit.cloudera.org:8080/#/c/7560/10/fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
File fe/src/test/java/org/apache/impala/common/FrontendTestBase.java:

PS10, Line 170: but no
  :* other metadata
This isn't true, for HDFS tables we set the partition map (to the 'default 
partition').

It might just be easier and more clear to say:

Add a new dummy table to the catalog based on the given CREATE TABLE sql. The 
returned table only has its metadata partially set, but is capable of being 
planned.

I'd be curious to hear what Alex thinks.


-- 
To view, visit http://gerrit.cloudera.org:8080/7560
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables

2017-08-23 Thread Bikramjeet Vig (Code Review)
Hello Matthew Jacobs, Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7560

to look at the new patch set (#10).

Change subject: IMPALA-5602: Fix query optimization for kudu and datasource 
tables
..

IMPALA-5602: Fix query optimization for kudu and datasource tables

Fix a bug where the following queries on kudu and datasource tables
were incorrectly being optimized as a 'small query' and therefore
running on a single node with a single scanner thread:

(A) that have all their predicates pushed to the underlying storage
layer and have a limit
(B) table stats missing + Conditions in (A)

Testing:
Added frontend planner tests.

Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4
---
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/data-source-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
10 files changed, 129 insertions(+), 38 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/7560/10
-- 
To view, visit http://gerrit.cloudera.org:8080/7560
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables

2017-08-23 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has posted comments on this change.

Change subject: IMPALA-5602: Fix query optimization for kudu and datasource 
tables
..


Patch Set 8:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/7560/8/fe/src/main/java/org/apache/impala/catalog/KuduTable.java
File fe/src/main/java/org/apache/impala/catalog/KuduTable.java:

PS8, Line 444: public void loadSchemaFromKudu() throws ImpalaRuntimeException
> Move this above loadSchema() so we have these functions clustered.
Done


http://gerrit.cloudera.org:8080/#/c/7560/8/fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
File fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java:

Line 513:   public boolean hasStorageLayerConjuncts() {
> single line? (and elsewhere)
Done


http://gerrit.cloudera.org:8080/#/c/7560/8/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
File fe/src/main/java/org/apache/impala/planner/KuduScanNode.java:

Line 526:   public boolean hasStorageLayerConjuncts() {
> single line?
Done


http://gerrit.cloudera.org:8080/#/c/7560/8/fe/src/main/java/org/apache/impala/planner/ScanNode.java
File fe/src/main/java/org/apache/impala/planner/ScanNode.java:

Line 217:* Returns true if the node contains any conjuncts to be processed 
by Impala
> Returns true if this node has conjuncts to be evaluated by Impala against t
Done


Line 224:* Returns true if the node contains any conjuncts pushed to the 
underlying storage
> Returns true if this node has conjuncts to be evaluated by the underlying s
Done


Line 228: // Derived classes must override this method if they have any 
pushed conjuncts
> don't think we need this, then single line
Done


http://gerrit.cloudera.org:8080/#/c/7560/8/fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
File fe/src/test/java/org/apache/impala/common/FrontendTestBase.java:

Line 169:* Add a new dummy HDFS or external Kudu table to the catalog based 
on the given CREATE
> Remove the "HDFS or external Kudu" part here. You already explain the limit
oops, forgot to remove this.


Line 199: }else if (dummyTable instanceof KuduTable) {
> space after "}"
Done


PS8, Line 200:   if (!Table.isExternalTable(msTbl)) {
 : fail("Failed to add table, external kudu table 
expected:\n" + createTableSql);
 :   }
> My concern with this approach is that this now does depend on Kudu, and I'm
This wont work for managed kudu tables because this method requires we only 
create a dummy table in the catalog and no actual table is created in kudu. 
This breaks planner code because when it creates a kudu scan node, the init 
method in the scan node connects to kudu client and checks if the table exists 
or not then throws an exception if it doesn't.


-- 
To view, visit http://gerrit.cloudera.org:8080/7560
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables

2017-08-23 Thread Bikramjeet Vig (Code Review)
Hello Matthew Jacobs, Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7560

to look at the new patch set (#9).

Change subject: IMPALA-5602: Fix query optimization for kudu and datasource 
tables
..

IMPALA-5602: Fix query optimization for kudu and datasource tables

Fix a bug where the following queries on kudu and datasource tables
were incorrectly being optimized as a 'small query' and therefore
running on a single node with a single scanner thread:

(A) that have all their predicates pushed to the underlying storage
layer and have a limit
(B) table stats missing + Conditions in (A)

Testing:
Added frontend planner tests.

Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4
---
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/data-source-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
10 files changed, 129 insertions(+), 38 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/7560/9
-- 
To view, visit http://gerrit.cloudera.org:8080/7560
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables

2017-08-23 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5602: Fix query optimization for kudu and datasource 
tables
..


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7560/8/fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
File fe/src/test/java/org/apache/impala/common/FrontendTestBase.java:

PS8, Line 200:   if (!Table.isExternalTable(msTbl)) {
 : fail("Failed to add table, external kudu table 
expected:\n" + createTableSql);
 :   }
My concern with this approach is that this now does depend on Kudu, and I'm not 
sure we have to. Is there a reason not to make this work for managed Kudu 
tables, and just set the KuduTable.primaryKeyColumnNames_ from the 
stmt.getKuduPartitionParams()? Then I think this doesn't require calling into 
Kudu. Also it's less restrictive than only supporting Kudu tables that already 
exist.


-- 
To view, visit http://gerrit.cloudera.org:8080/7560
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables

2017-08-23 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5602: Fix query optimization for kudu and datasource 
tables
..


Patch Set 8:

(9 comments)

Nice! Looks much better. Minor nits left

http://gerrit.cloudera.org:8080/#/c/7560/8/fe/src/main/java/org/apache/impala/catalog/KuduTable.java
File fe/src/main/java/org/apache/impala/catalog/KuduTable.java:

PS8, Line 444: public void loadSchemaFromKudu() throws ImpalaRuntimeException
> Extracted relevant code from KuduTable.load() for loading metadata from kud
Move this above loadSchema() so we have these functions clustered.


http://gerrit.cloudera.org:8080/#/c/7560/8/fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
File fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java:

Line 513:   public boolean hasStorageLayerConjuncts() {
single line? (and elsewhere)


http://gerrit.cloudera.org:8080/#/c/7560/8/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
File fe/src/main/java/org/apache/impala/planner/KuduScanNode.java:

Line 526:   public boolean hasStorageLayerConjuncts() {
single line?


http://gerrit.cloudera.org:8080/#/c/7560/8/fe/src/main/java/org/apache/impala/planner/ScanNode.java
File fe/src/main/java/org/apache/impala/planner/ScanNode.java:

Line 217:* Returns true if the node contains any conjuncts to be processed 
by Impala
Returns true if this node has conjuncts to be evaluated by Impala against the 
scan tuple.

(This is more precise, because it excludes conjuncts on nested collections as 
well as dictionary and/or min/max filters which are also evaluated by Impala)


Line 224:* Returns true if the node contains any conjuncts pushed to the 
underlying storage
Returns true if this node has conjuncts to be evaluated by the underlying 
storage engine.


Line 228: // Derived classes must override this method if they have any 
pushed conjuncts
don't think we need this, then single line


http://gerrit.cloudera.org:8080/#/c/7560/8/fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
File fe/src/test/java/org/apache/impala/common/FrontendTestBase.java:

Line 169:* Add a new dummy HDFS or external Kudu table to the catalog based 
on the given CREATE
Remove the "HDFS or external Kudu" part here. You already explain the 
limitations later (which seems more suitable to me)


Line 199: }else if (dummyTable instanceof KuduTable) {
space after "}"


http://gerrit.cloudera.org:8080/#/c/7560/8/fe/src/test/java/org/apache/impala/planner/PlannerTest.java
File fe/src/test/java/org/apache/impala/planner/PlannerTest.java:

Line 318: addTestTable("CREATE EXTERNAL TABLE kudu_planner_test.no_stats 
STORED AS KUDU " +
This is pretty awesome, thanks for adding it!


-- 
To view, visit http://gerrit.cloudera.org:8080/7560
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables

2017-08-23 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has posted comments on this change.

Change subject: IMPALA-5602: Fix query optimization for kudu and datasource 
tables
..


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7560/8/fe/src/main/java/org/apache/impala/catalog/KuduTable.java
File fe/src/main/java/org/apache/impala/catalog/KuduTable.java:

PS8, Line 444: public void loadSchemaFromKudu() throws ImpalaRuntimeException
Extracted relevant code from KuduTable.load() for loading metadata from kudu 
into a separate public method and using that in FrontendTestBase.addTestTable() 
to create a temp table in catalog.

Please let me know if this change is acceptable for adding support for kudu 
tables in FrontendTestBase.addTestTable(). If not, I will revert back to the 
previous patch set.


-- 
To view, visit http://gerrit.cloudera.org:8080/7560
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables

2017-08-23 Thread Bikramjeet Vig (Code Review)
Hello Matthew Jacobs, Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7560

to look at the new patch set (#8).

Change subject: IMPALA-5602: Fix query optimization for kudu and datasource 
tables
..

IMPALA-5602: Fix query optimization for kudu and datasource tables

Fix a bug where the following queries on kudu and datasource tables
were incorrectly being optimized as a 'small query' and therefore
running on a single node with a single scanner thread:

(A) that have all their predicates pushed to the underlying storage
layer and have a limit
(B) table stats missing + Conditions in (A)

Testing:
Added frontend planner tests.

Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4
---
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/data-source-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
10 files changed, 141 insertions(+), 40 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/7560/8
-- 
To view, visit http://gerrit.cloudera.org:8080/7560
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables

2017-08-22 Thread Bikramjeet Vig (Code Review)
Hello Matthew Jacobs, Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7560

to look at the new patch set (#7).

Change subject: IMPALA-5602: Fix query optimization for kudu and datasource 
tables
..

IMPALA-5602: Fix query optimization for kudu and datasource tables

Fix a bug where the following queries on kudu and datasource tables
were incorrectly being optimized as a 'small query' and therefore
running on a single node with a single scanner thread:

(A) that have all their predicates pushed to the underlying storage
layer and have a limit
(B) table stats missing + Conditions in (A)

Testing:
For (A): Added a frontend planner test for kudu tables
For (B): Added an end to end planner test for kudu and a query test
for datasource tables

Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4
---
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/data-source-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M tests/query_test/test_kudu.py
8 files changed, 92 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/7560/7
-- 
To view, visit http://gerrit.cloudera.org:8080/7560
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables

2017-08-22 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has posted comments on this change.

Change subject: IMPALA-5602: Fix query optimization for kudu and datasource 
tables
..


Patch Set 6:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/7560/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java
File fe/src/main/java/org/apache/impala/planner/ScanNode.java:

Line 219: // Derived classes should override this method if they have any 
pushed conjuncts
> Fair point. To me it's not confusing, but I'll defer to MJ. Instead of addi
As discussed, using hasScanConjuncts to denote conjuncts being used by impala, 
and hasStorageLayerConjuncts to denote those pushed down to the storage layer.


http://gerrit.cloudera.org:8080/#/c/7560/5/fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java
File fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java:

Line 57: valid_ = false;
> MJ, does this change even make sense? Suppose we have a query with a limit 
as discussed, this code is correct as we need multiple impala instances to make 
kudu scans faster


http://gerrit.cloudera.org:8080/#/c/7560/5/testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test
File 
testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test:

Line 132:  QUERY
> I don't think that's true. This is what I get:
Done


http://gerrit.cloudera.org:8080/#/c/7560/5/tests/query_test/test_kudu.py
File tests/query_test/test_kudu.py:

Line 1010: """IMPALA-5602: Test that 'small query' optimization is not used 
if table stats are
> I understand. My question is whether you have tried extending addTestTable(
Still working on a way to make this work. What I have found till now:

addTestTable must be able to create a table only in the catalog and not in the 
actual storage layer. The problem is that when a kuduScan node is created in 
the planning phase it connects to the kudu client to make sure that the table 
exists. Now to get around this, if we create an external table, we will still 
have to copy code out of KuduTable.load() in order to load the columns from the 
kudu client. KuduTable.load() cannot be used directly as it requires a handle 
to HMS instance. Either this, or we refactor the code in KuduTable.load() to 
extract the required code into a separate method and call that.


-- 
To view, visit http://gerrit.cloudera.org:8080/7560
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables

2017-08-21 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5602: Fix query optimization for kudu and datasource 
tables
..


Patch Set 5:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/7560/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java
File fe/src/main/java/org/apache/impala/planner/ScanNode.java:

Line 219:   public boolean hasPushedConjuncts() {
> Makes sense, but as Matt pointed out earlier in a discussion that having si
Fair point. To me it's not confusing, but I'll defer to MJ. Instead of adding 
to the conjunct confusion, I suggest we override getInputCardinality() in every 
scan node. The ScanNode already overrides the default implementation in 
PlanNode, so the change would only continue the specialization downwards.

I understand that we also use this code in the max rows visitor, but I'm not 
sure the use makes sense there (see my comment in the other file).


http://gerrit.cloudera.org:8080/#/c/7560/5/fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java
File fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java:

Line 57:   !scan.hasPushedConjuncts( {
MJ, does this change even make sense? Suppose we have a query with a limit and 
only Kudu conjuncts. From Impala's point of view the input cardinality is still 
the limit, so why run the scan on multiple impalads? Will Kudu be faster in 
scanning if we query it with multiple impalads? (Similar question for the 
datasource scan)


http://gerrit.cloudera.org:8080/#/c/7560/5/testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test
File 
testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test:

Line 132:  QUERY
> This code change has no effect on the plan created since for a data source 
I don't think that's true. This is what I get:

[localhost:21000] > explain select * from alltypes_datasource;
Query: explain select * from alltypes_datasource
++
| Explain String
 |
++
| Max Per-Host Resource Reservation: Memory=0B  
 |
| Per-Host Resource Estimates: Memory=1.00GB
 |
| WARNING: The following tables are missing relevant table and/or column 
statistics. |
| functional.alltypes_datasource
 |
|   
 |
| PLAN-ROOT SINK
 |
| | 
 |
| 01:EXCHANGE [UNPARTITIONED]   
 |
| | 
 |
| 00:SCAN DATA SOURCE [functional.alltypes_datasource]  
 |
++

With the single-node optimization we should not have an exchange.


http://gerrit.cloudera.org:8080/#/c/7560/5/tests/query_test/test_kudu.py
File tests/query_test/test_kudu.py:

Line 1010: """IMPALA-5602: Test that 'small query' optimization is not used 
if table stats are
> Unfortunately FrontendTestBase.addTestTable() can only be used to add hdfs 
I understand. My question is whether you have tried extending addTestTable() to 
work for this use case. Extending our FE unit testing capabilities is 
preferable to adding a one-off EE test.

If the extension is too hard, that's fine, but I'm not yet convinced that it is.


-- 
To view, visit http://gerrit.cloudera.org:8080/7560
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables

2017-08-21 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has posted comments on this change.

Change subject: IMPALA-5602: Fix query optimization for kudu and datasource 
tables
..


Patch Set 5:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/7560/5/fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
File fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java:

Line 367:   public boolean hasPushedConjuncts() { return 
!acceptedConjuncts_.isEmpty(); }
> Do we have the same issue with 'filters_' in the HBaseScanNode?
No, for HBaseScanNode, if filters are created they are not removed from the 
conjuncts member variable


http://gerrit.cloudera.org:8080/#/c/7560/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java
File fe/src/main/java/org/apache/impala/planner/ScanNode.java:

Line 219:   public boolean hasPushedConjuncts() {
> It seems strange and unnecessary to distinguish the "pushed" from "non-push
Makes sense, but as Matt pointed out earlier in a discussion that having 
similar names like hasConjuncts() or hasScanConjuncts() might make the 
interface to this class confusing while comparing hasConjuncts() and 
getConjuncts().isEmpty().

Would it make more sense to have a more specific name like he suggested 
earlier: getEffectiveScanConjuncts() ?


http://gerrit.cloudera.org:8080/#/c/7560/5/fe/src/test/java/org/apache/impala/planner/PlannerTest.java
File fe/src/test/java/org/apache/impala/planner/PlannerTest.java:

Line 317: TQueryOptions options = defaultQueryOptions();
> What do we need this change for? This adds a lot expected test output.
Done


http://gerrit.cloudera.org:8080/#/c/7560/5/testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
File testdata/workloads/functional-planner/queries/PlannerTest/kudu.test:

Line 774:  DISTRIBUTEDPLAN
> We don't need explain level VERBOSE to test this.
Done


http://gerrit.cloudera.org:8080/#/c/7560/5/testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test
File 
testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test:

Line 132:  QUERY
> a PlannerTest is more suitable
This code change has no effect on the plan created since for a data source a 
single node is used in the plan regardless of other optimizations. So, to make 
sure that small query optimization is not used, I had to check the runtime 
profile instead and make sure query options set by small query optimization 
dont appear


http://gerrit.cloudera.org:8080/#/c/7560/5/tests/query_test/test_kudu.py
File tests/query_test/test_kudu.py:

Line 1010: """IMPALA-5602: Test that 'small query' optimization is not used 
if table stats are
> Have you tried to see if we can modify  FrontendTestBase.addTestTable() to 
Unfortunately FrontendTestBase.addTestTable() can only be used to add hdfs 
tables.


http://gerrit.cloudera.org:8080/#/c/7560/5/tests/query_test/test_queries.py
File tests/query_test/test_queries.py:

Line 164: # Reset this exec_option to check default behaviour of any 
planner optimization tests
> Why? This way we lose coverage of the other case. My understanding is that 
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7560
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables

2017-08-21 Thread Bikramjeet Vig (Code Review)
Hello Matthew Jacobs, Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7560

to look at the new patch set (#6).

Change subject: IMPALA-5602: Fix query optimization for kudu and datasource 
tables
..

IMPALA-5602: Fix query optimization for kudu and datasource tables

Fix a bug where the following queries on kudu and datasource tables
were incorrectly being optimized as a 'small query' and therefore
running on a single node with a single scanner thread:

(A) that have all their predicates pushed to the underlying storage
layer and have a limit
(B) table stats missing + Conditions in (A)

Testing:
For (A): Added a frontend planner test for kudu tables
For (B): Added an end to end planner test for kudu and a query test
for datasource tables

Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4
---
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test
M tests/query_test/test_kudu.py
7 files changed, 70 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/7560/6
-- 
To view, visit http://gerrit.cloudera.org:8080/7560
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables

2017-08-21 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5602: Fix query optimization for kudu and datasource 
tables
..


Patch Set 5:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/7560/5/fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
File fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java:

Line 367:   public boolean hasPushedConjuncts() { return 
!acceptedConjuncts_.isEmpty(); }
Do we have the same issue with 'filters_' in the HBaseScanNode?


http://gerrit.cloudera.org:8080/#/c/7560/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java
File fe/src/main/java/org/apache/impala/planner/ScanNode.java:

Line 219:   public boolean hasPushedConjuncts() {
It seems strange and unnecessary to distinguish the "pushed" from "non-pushed" 
conjuncts, especially since the concept is only relevant for certain scans. I'd 
prefer a function  hasConjuncts() or hasScanConjuncts() or similar. 
Alternatively, each ScanNode subclass could directly override 
getInputCardinality().


http://gerrit.cloudera.org:8080/#/c/7560/5/fe/src/test/java/org/apache/impala/planner/PlannerTest.java
File fe/src/test/java/org/apache/impala/planner/PlannerTest.java:

Line 317: TQueryOptions options = defaultQueryOptions();
What do we need this change for? This adds a lot expected test output.


http://gerrit.cloudera.org:8080/#/c/7560/5/testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
File testdata/workloads/functional-planner/queries/PlannerTest/kudu.test:

Line 774:  DISTRIBUTEDPLAN
We don't need explain level VERBOSE to test this.


http://gerrit.cloudera.org:8080/#/c/7560/5/testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test
File 
testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test:

Line 132:  QUERY
a PlannerTest is more suitable


http://gerrit.cloudera.org:8080/#/c/7560/5/tests/query_test/test_kudu.py
File tests/query_test/test_kudu.py:

Line 1010: """IMPALA-5602: Test that 'small query' optimization is not used 
if table stats are
Have you tried to see if we can modify  FrontendTestBase.addTestTable() to 
serve this use case? It's unfortunate if we have to split up the tests like 
this.


http://gerrit.cloudera.org:8080/#/c/7560/5/tests/query_test/test_queries.py
File tests/query_test/test_queries.py:

Line 164: # Reset this exec_option to check default behaviour of any 
planner optimization tests
Why? This way we lose coverage of the other case. My understanding is that in 
exhaustive we run with and without this option which, so I think we should 
leave this as is.


-- 
To view, visit http://gerrit.cloudera.org:8080/7560
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables

2017-08-17 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5602: Fix query optimization for kudu and datasource 
tables
..


Patch Set 5: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7560/4/tests/query_test/test_queries.py
File tests/query_test/test_queries.py:

Line 166: del 
vector.get_value('exec_option')['exec_single_node_rows_threshold']
> I tried running all tests in this class to check for this and it seems the 
Hmm maybe it was fixed or something.


-- 
To view, visit http://gerrit.cloudera.org:8080/7560
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables

2017-08-17 Thread Bikramjeet Vig (Code Review)
Hello Matthew Jacobs, Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7560

to look at the new patch set (#5).

Change subject: IMPALA-5602: Fix query optimization for kudu and datasource 
tables
..

IMPALA-5602: Fix query optimization for kudu and datasource tables

Fix a bug where the following queries on kudu and datasource tables
were incorrectly being optimized as a 'small query' and therefore
running on a single node with a single scanner thread:

(A) that have all their predicates pushed to the underlying storage
layer and have a limit
(B) table stats missing + Conditions in (A)

Testing:
For (A): Added a frontend planner test for kudu tables
For (B): Added an end to end planner test for kudu and a query test
for datasource tables

Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4
---
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test
M tests/query_test/test_kudu.py
M tests/query_test/test_queries.py
9 files changed, 687 insertions(+), 262 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/7560/5
-- 
To view, visit http://gerrit.cloudera.org:8080/7560
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables

2017-08-17 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has posted comments on this change.

Change subject: IMPALA-5602: Fix query optimization for kudu and datasource 
tables
..


Patch Set 3:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/7560/4/fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
File fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java:

Line 367
> formatting
Done


http://gerrit.cloudera.org:8080/#/c/7560/3/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
File fe/src/main/java/org/apache/impala/planner/KuduScanNode.java:

Line 527: return !kuduConjuncts_.isEmpty();
> nit oneline
Done


http://gerrit.cloudera.org:8080/#/c/7560/3/fe/src/main/java/org/apache/impala/planner/ScanNode.java
File fe/src/main/java/org/apache/impala/planner/ScanNode.java:

PS3, Line 219: (){
> nit: missing space
Done


PS3, Line 219: (){
> You are correct, that should be fixed as well.
Done


Line 219:   public boolean hasPushedConjuncts(){
> I think DataSourceScanNode can also push conjuncts.
Done


http://gerrit.cloudera.org:8080/#/c/7560/4/testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test
File 
testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test:

PS4, Line 138: 
> can you put the full key here, i.e. + "(non default)" or whatever? That way
adding "(non default)" for now. Will switch to whatever we decide to add in 
IMPALA-5784 for representing changes made by planner.


http://gerrit.cloudera.org:8080/#/c/7560/3/tests/query_test/test_kudu.py
File tests/query_test/test_kudu.py:

Line 1007:   def test_missing_table_stats(self, cursor, kudu_client, 
unique_database):
> This needs @SkipIfNotHdfsMinicluster.plans since it's assuming a 3 node min
Done


PS3, Line 1019: self.client.execute("set explain_level=3")
> you can actually use client.set_configuration(config_map)
Done


PS3, Line 1024: assert str(result).find("hosts=3 instances=3") != -1
> the more python-y way to say this is
Done


http://gerrit.cloudera.org:8080/#/c/7560/4/tests/query_test/test_kudu.py
File tests/query_test/test_kudu.py:

PS4, Line 1005: 
> creating a table.
Done


PS4, Line 1010:  k
> JIRA number:
Done


http://gerrit.cloudera.org:8080/#/c/7560/4/tests/query_test/test_queries.py
File tests/query_test/test_queries.py:

Line 165: 
> nit: missing .
Done


Line 166:   def test_distinct_estimate(self, vector):
> I think this is modifying the vector in-place. I.e. the change can leak out
I tried running all tests in this class to check for this and it seems the 
changes to the vector do not carry on to other tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/7560
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables

2017-08-17 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5602: Fix query optimization for kudu and datasource 
tables
..


Patch Set 4: Code-Review+1

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7560/4/tests/query_test/test_queries.py
File tests/query_test/test_queries.py:

Line 165: # that rely on it
nit: missing .


Line 166: del 
vector.get_value('exec_option')['exec_single_node_rows_threshold']
I think this is modifying the vector in-place. I.e. the change can leak out to 
other places. I'm not a py.test expert but we definitely copy the vector in 
other places to avoid this problem.


-- 
To view, visit http://gerrit.cloudera.org:8080/7560
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables

2017-08-17 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5602: Fix query optimization for kudu and datasource 
tables
..


Patch Set 4:

also it looks like this needs a rebase

-- 
To view, visit http://gerrit.cloudera.org:8080/7560
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables

2017-08-17 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5602: Fix query optimization for kudu and datasource 
tables
..


Patch Set 4: Code-Review+1

(4 comments)

Let's see if Alex and or Tim want to take another look

http://gerrit.cloudera.org:8080/#/c/7560/4/fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
File fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java:

Line 367:   public boolean hasPushedConjuncts(){ return 
!acceptedConjuncts_.isEmpty(); }
formatting


http://gerrit.cloudera.org:8080/#/c/7560/4/testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test
File 
testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test:

PS4, Line 138: s(
can you put the full key here, i.e. + "(non default)" or whatever? That way 
we'll be sure to update it after IMPALA-5784 and then I think this will become 
a more convincing test case because we know the planner was the one that set 
this option.


http://gerrit.cloudera.org:8080/#/c/7560/4/tests/query_test/test_kudu.py
File tests/query_test/test_kudu.py:

PS4, Line 1005: creating table
creating a table.


PS4, Line 1010: "T
JIRA number:


-- 
To view, visit http://gerrit.cloudera.org:8080/7560
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables

2017-08-16 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has uploaded a new patch set (#4).

Change subject: IMPALA-5602: Fix query optimization for kudu and datasource 
tables
..

IMPALA-5602: Fix query optimization for kudu and datasource tables

Fix a bug where the following queries on kudu and datasource tables
were incorrectly being optimized as a 'small query' and therefore
running on a single node with a single scanner thread:

(A) that have all their predicates pushed to the underlying storage
layer and have a limit
(B) table stats missing + Conditions in (A)

Testing:
For (A): Added a frontend planner test for kudu tables
For (B): Added an end to end planner test for kudu and a query test
for datasource tables

Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4
---
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test
M tests/query_test/test_kudu.py
M tests/query_test/test_queries.py
9 files changed, 673 insertions(+), 256 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/7560/4
-- 
To view, visit http://gerrit.cloudera.org:8080/7560
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong