Alex Behm has posted comments on this change. Change subject: IMPALA-5602: Fix query optimization for kudu and datasource tables ......................................................................
Patch Set 5: (4 comments) http://gerrit.cloudera.org:8080/#/c/7560/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java File fe/src/main/java/org/apache/impala/planner/ScanNode.java: Line 219: public boolean hasPushedConjuncts() { > Makes sense, but as Matt pointed out earlier in a discussion that having si Fair point. To me it's not confusing, but I'll defer to MJ. Instead of adding to the conjunct confusion, I suggest we override getInputCardinality() in every scan node. The ScanNode already overrides the default implementation in PlanNode, so the change would only continue the specialization downwards. I understand that we also use this code in the max rows visitor, but I'm not sure the use makes sense there (see my comment in the other file). http://gerrit.cloudera.org:8080/#/c/7560/5/fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java File fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java: Line 57: !scan.hasPushedConjuncts()))) { MJ, does this change even make sense? Suppose we have a query with a limit and only Kudu conjuncts. From Impala's point of view the input cardinality is still the limit, so why run the scan on multiple impalads? Will Kudu be faster in scanning if we query it with multiple impalads? (Similar question for the datasource scan) http://gerrit.cloudera.org:8080/#/c/7560/5/testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test File testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test: Line 132: ---- QUERY > This code change has no effect on the plan created since for a data source I don't think that's true. This is what I get: [localhost:21000] > explain select * from alltypes_datasource; Query: explain select * from alltypes_datasource +------------------------------------------------------------------------------------+ | Explain String | +------------------------------------------------------------------------------------+ | Max Per-Host Resource Reservation: Memory=0B | | Per-Host Resource Estimates: Memory=1.00GB | | WARNING: The following tables are missing relevant table and/or column statistics. | | functional.alltypes_datasource | | | | PLAN-ROOT SINK | | | | | 01:EXCHANGE [UNPARTITIONED] | | | | | 00:SCAN DATA SOURCE [functional.alltypes_datasource] | +------------------------------------------------------------------------------------+ With the single-node optimization we should not have an exchange. http://gerrit.cloudera.org:8080/#/c/7560/5/tests/query_test/test_kudu.py File tests/query_test/test_kudu.py: Line 1010: """IMPALA-5602: Test that 'small query' optimization is not used if table stats are > Unfortunately FrontendTestBase.addTestTable() can only be used to add hdfs I understand. My question is whether you have tried extending addTestTable() to work for this use case. Extending our FE unit testing capabilities is preferable to adding a one-off EE test. If the extension is too hard, that's fine, but I'm not yet convinced that it is. -- To view, visit http://gerrit.cloudera.org:8080/7560 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet Vig <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Bikramjeet Vig <[email protected]> Gerrit-Reviewer: Matthew Jacobs <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
