[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables

Alex Behm (Code Review) Mon, 21 Aug 2017 22:10:51 -0700

Alex Behm has posted comments on this change.

Change subject: IMPALA-5602: Fix query optimization for kudu and datasource 
tables
......................................................................



Patch Set 5:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/7560/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java
File fe/src/main/java/org/apache/impala/planner/ScanNode.java:

Line 219:   public boolean hasPushedConjuncts() {
> Makes sense, but as Matt pointed out earlier in a discussion that having si
Fair point. To me it's not confusing, but I'll defer to MJ. Instead of adding 
to the conjunct confusion, I suggest we override getInputCardinality() in every 
scan node. The ScanNode already overrides the default implementation in 
PlanNode, so the change would only continue the specialization downwards.

I understand that we also use this code in the max rows visitor, but I'm not 
sure the use makes sense there (see my comment in the other file).


http://gerrit.cloudera.org:8080/#/c/7560/5/fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java
File fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java:

Line 57:               !scan.hasPushedConjuncts()))) {
MJ, does this change even make sense? Suppose we have a query with a limit and 
only Kudu conjuncts. From Impala's point of view the input cardinality is still 
the limit, so why run the scan on multiple impalads? Will Kudu be faster in 
scanning if we query it with multiple impalads? (Similar question for the 
datasource scan)


http://gerrit.cloudera.org:8080/#/c/7560/5/testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test
File 
testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test:

Line 132: ---- QUERY
> This code change has no effect on the plan created since for a data source 
I don't think that's true. This is what I get:

[localhost:21000] > explain select * from alltypes_datasource;
Query: explain select * from alltypes_datasource
+------------------------------------------------------------------------------------+
| Explain String                                                                
     |
+------------------------------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=0B                                  
     |
| Per-Host Resource Estimates: Memory=1.00GB                                    
     |
| WARNING: The following tables are missing relevant table and/or column 
statistics. |
| functional.alltypes_datasource                                                
     |
|                                                                               
     |
| PLAN-ROOT SINK                                                                
     |
| |                                                                             
     |
| 01:EXCHANGE [UNPARTITIONED]                                                   
     |
| |                                                                             
     |
| 00:SCAN DATA SOURCE [functional.alltypes_datasource]                          
     |
+------------------------------------------------------------------------------------+

With the single-node optimization we should not have an exchange.


http://gerrit.cloudera.org:8080/#/c/7560/5/tests/query_test/test_kudu.py
File tests/query_test/test_kudu.py:

Line 1010:     """IMPALA-5602: Test that 'small query' optimization is not used 
if table stats are
> Unfortunately FrontendTestBase.addTestTable() can only be used to add hdfs 
I understand. My question is whether you have tried extending addTestTable() to 
work for this use case. Extending our FE unit testing capabilities is 
preferable to adding a one-off EE test.

If the extension is too hard, that's fine, but I'm not yet convinced that it is.


-- 
To view, visit http://gerrit.cloudera.org:8080/7560
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Bikramjeet Vig <[email protected]>
Gerrit-Reviewer: Matthew Jacobs <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables

Reply via email to