[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5602: Fix query optimization for kudu and datasource tables .. Patch Set 11: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/7560 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-5602: Fix query optimization for kudu and datasource tables .. IMPALA-5602: Fix query optimization for kudu and datasource tables Fix a bug where the following queries on kudu and datasource tables were incorrectly being optimized as a 'small query' and therefore running on a single node with a single scanner thread: (A) that have all their predicates pushed to the underlying storage layer and have a limit (B) table stats missing + Conditions in (A) Testing: Added frontend planner tests. Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4 Reviewed-on: http://gerrit.cloudera.org:8080/7560 Reviewed-by: Matthew JacobsTested-by: Impala Public Jenkins --- M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/data-source-tables.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test 10 files changed, 131 insertions(+), 40 deletions(-) Approvals: Impala Public Jenkins: Verified Matthew Jacobs: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/7560 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4 Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5602: Fix query optimization for kudu and datasource tables .. Patch Set 11: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1138/ -- To view, visit http://gerrit.cloudera.org:8080/7560 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-5602: Fix query optimization for kudu and datasource tables .. Patch Set 11: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7560 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables
Hello Matthew Jacobs, Alex Behm, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7560 to look at the new patch set (#11). Change subject: IMPALA-5602: Fix query optimization for kudu and datasource tables .. IMPALA-5602: Fix query optimization for kudu and datasource tables Fix a bug where the following queries on kudu and datasource tables were incorrectly being optimized as a 'small query' and therefore running on a single node with a single scanner thread: (A) that have all their predicates pushed to the underlying storage layer and have a limit (B) table stats missing + Conditions in (A) Testing: Added frontend planner tests. Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4 --- M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/data-source-tables.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test 10 files changed, 131 insertions(+), 40 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/7560/11 -- To view, visit http://gerrit.cloudera.org:8080/7560 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables
Bikramjeet Vig has posted comments on this change. Change subject: IMPALA-5602: Fix query optimization for kudu and datasource tables .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/7560/10/fe/src/test/java/org/apache/impala/common/FrontendTestBase.java File fe/src/test/java/org/apache/impala/common/FrontendTestBase.java: PS10, Line 170: table :* set, but no ot > This isn't true, for HDFS tables we set the partition map (to the 'default Done -- To view, visit http://gerrit.cloudera.org:8080/7560 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables
Alex Behm has posted comments on this change. Change subject: IMPALA-5602: Fix query optimization for kudu and datasource tables .. Patch Set 10: Code-Review+2 (2 comments) Please address MJ's last comment. http://gerrit.cloudera.org:8080/#/c/7560/8/fe/src/test/java/org/apache/impala/common/FrontendTestBase.java File fe/src/test/java/org/apache/impala/common/FrontendTestBase.java: PS8, Line 200: fail("Failed to add table, external kudu table expected:\n" + createTableSql); : } : t > Right, makes sense. Yeah, so we're already tied to Kudu in planner tests. I Current approach seems fine to me. A test only depends on Kudu if it uses addTestTable() with a Kudu table. Seems no different from existing tests. http://gerrit.cloudera.org:8080/#/c/7560/10/fe/src/test/java/org/apache/impala/common/FrontendTestBase.java File fe/src/test/java/org/apache/impala/common/FrontendTestBase.java: PS10, Line 170: but no :* other metadata > This isn't true, for HDFS tables we set the partition map (to the 'default I like your suggestion. -- To view, visit http://gerrit.cloudera.org:8080/7560 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-5602: Fix query optimization for kudu and datasource tables .. Patch Set 10: (2 comments) http://gerrit.cloudera.org:8080/#/c/7560/8/fe/src/test/java/org/apache/impala/common/FrontendTestBase.java File fe/src/test/java/org/apache/impala/common/FrontendTestBase.java: PS8, Line 200: fail("Failed to add table, external kudu table expected:\n" + createTableSql); : } : t > This wont work for managed kudu tables because this method requires we only Right, makes sense. Yeah, so we're already tied to Kudu in planner tests. I guess this is the downside of not keeping this metadata in the catalog. I agree then it's not worth trying to address here. I'm fine with this approach if Alex is as well. http://gerrit.cloudera.org:8080/#/c/7560/10/fe/src/test/java/org/apache/impala/common/FrontendTestBase.java File fe/src/test/java/org/apache/impala/common/FrontendTestBase.java: PS10, Line 170: but no :* other metadata This isn't true, for HDFS tables we set the partition map (to the 'default partition'). It might just be easier and more clear to say: Add a new dummy table to the catalog based on the given CREATE TABLE sql. The returned table only has its metadata partially set, but is capable of being planned. I'd be curious to hear what Alex thinks. -- To view, visit http://gerrit.cloudera.org:8080/7560 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables
Hello Matthew Jacobs, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7560 to look at the new patch set (#10). Change subject: IMPALA-5602: Fix query optimization for kudu and datasource tables .. IMPALA-5602: Fix query optimization for kudu and datasource tables Fix a bug where the following queries on kudu and datasource tables were incorrectly being optimized as a 'small query' and therefore running on a single node with a single scanner thread: (A) that have all their predicates pushed to the underlying storage layer and have a limit (B) table stats missing + Conditions in (A) Testing: Added frontend planner tests. Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4 --- M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/data-source-tables.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test 10 files changed, 129 insertions(+), 38 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/7560/10 -- To view, visit http://gerrit.cloudera.org:8080/7560 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables
Bikramjeet Vig has posted comments on this change. Change subject: IMPALA-5602: Fix query optimization for kudu and datasource tables .. Patch Set 8: (9 comments) http://gerrit.cloudera.org:8080/#/c/7560/8/fe/src/main/java/org/apache/impala/catalog/KuduTable.java File fe/src/main/java/org/apache/impala/catalog/KuduTable.java: PS8, Line 444: public void loadSchemaFromKudu() throws ImpalaRuntimeException > Move this above loadSchema() so we have these functions clustered. Done http://gerrit.cloudera.org:8080/#/c/7560/8/fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java File fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java: Line 513: public boolean hasStorageLayerConjuncts() { > single line? (and elsewhere) Done http://gerrit.cloudera.org:8080/#/c/7560/8/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java File fe/src/main/java/org/apache/impala/planner/KuduScanNode.java: Line 526: public boolean hasStorageLayerConjuncts() { > single line? Done http://gerrit.cloudera.org:8080/#/c/7560/8/fe/src/main/java/org/apache/impala/planner/ScanNode.java File fe/src/main/java/org/apache/impala/planner/ScanNode.java: Line 217:* Returns true if the node contains any conjuncts to be processed by Impala > Returns true if this node has conjuncts to be evaluated by Impala against t Done Line 224:* Returns true if the node contains any conjuncts pushed to the underlying storage > Returns true if this node has conjuncts to be evaluated by the underlying s Done Line 228: // Derived classes must override this method if they have any pushed conjuncts > don't think we need this, then single line Done http://gerrit.cloudera.org:8080/#/c/7560/8/fe/src/test/java/org/apache/impala/common/FrontendTestBase.java File fe/src/test/java/org/apache/impala/common/FrontendTestBase.java: Line 169:* Add a new dummy HDFS or external Kudu table to the catalog based on the given CREATE > Remove the "HDFS or external Kudu" part here. You already explain the limit oops, forgot to remove this. Line 199: }else if (dummyTable instanceof KuduTable) { > space after "}" Done PS8, Line 200: if (!Table.isExternalTable(msTbl)) { : fail("Failed to add table, external kudu table expected:\n" + createTableSql); : } > My concern with this approach is that this now does depend on Kudu, and I'm This wont work for managed kudu tables because this method requires we only create a dummy table in the catalog and no actual table is created in kudu. This breaks planner code because when it creates a kudu scan node, the init method in the scan node connects to kudu client and checks if the table exists or not then throws an exception if it doesn't. -- To view, visit http://gerrit.cloudera.org:8080/7560 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables
Hello Matthew Jacobs, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7560 to look at the new patch set (#9). Change subject: IMPALA-5602: Fix query optimization for kudu and datasource tables .. IMPALA-5602: Fix query optimization for kudu and datasource tables Fix a bug where the following queries on kudu and datasource tables were incorrectly being optimized as a 'small query' and therefore running on a single node with a single scanner thread: (A) that have all their predicates pushed to the underlying storage layer and have a limit (B) table stats missing + Conditions in (A) Testing: Added frontend planner tests. Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4 --- M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/data-source-tables.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test 10 files changed, 129 insertions(+), 38 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/7560/9 -- To view, visit http://gerrit.cloudera.org:8080/7560 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-5602: Fix query optimization for kudu and datasource tables .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/7560/8/fe/src/test/java/org/apache/impala/common/FrontendTestBase.java File fe/src/test/java/org/apache/impala/common/FrontendTestBase.java: PS8, Line 200: if (!Table.isExternalTable(msTbl)) { : fail("Failed to add table, external kudu table expected:\n" + createTableSql); : } My concern with this approach is that this now does depend on Kudu, and I'm not sure we have to. Is there a reason not to make this work for managed Kudu tables, and just set the KuduTable.primaryKeyColumnNames_ from the stmt.getKuduPartitionParams()? Then I think this doesn't require calling into Kudu. Also it's less restrictive than only supporting Kudu tables that already exist. -- To view, visit http://gerrit.cloudera.org:8080/7560 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables
Alex Behm has posted comments on this change. Change subject: IMPALA-5602: Fix query optimization for kudu and datasource tables .. Patch Set 8: (9 comments) Nice! Looks much better. Minor nits left http://gerrit.cloudera.org:8080/#/c/7560/8/fe/src/main/java/org/apache/impala/catalog/KuduTable.java File fe/src/main/java/org/apache/impala/catalog/KuduTable.java: PS8, Line 444: public void loadSchemaFromKudu() throws ImpalaRuntimeException > Extracted relevant code from KuduTable.load() for loading metadata from kud Move this above loadSchema() so we have these functions clustered. http://gerrit.cloudera.org:8080/#/c/7560/8/fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java File fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java: Line 513: public boolean hasStorageLayerConjuncts() { single line? (and elsewhere) http://gerrit.cloudera.org:8080/#/c/7560/8/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java File fe/src/main/java/org/apache/impala/planner/KuduScanNode.java: Line 526: public boolean hasStorageLayerConjuncts() { single line? http://gerrit.cloudera.org:8080/#/c/7560/8/fe/src/main/java/org/apache/impala/planner/ScanNode.java File fe/src/main/java/org/apache/impala/planner/ScanNode.java: Line 217:* Returns true if the node contains any conjuncts to be processed by Impala Returns true if this node has conjuncts to be evaluated by Impala against the scan tuple. (This is more precise, because it excludes conjuncts on nested collections as well as dictionary and/or min/max filters which are also evaluated by Impala) Line 224:* Returns true if the node contains any conjuncts pushed to the underlying storage Returns true if this node has conjuncts to be evaluated by the underlying storage engine. Line 228: // Derived classes must override this method if they have any pushed conjuncts don't think we need this, then single line http://gerrit.cloudera.org:8080/#/c/7560/8/fe/src/test/java/org/apache/impala/common/FrontendTestBase.java File fe/src/test/java/org/apache/impala/common/FrontendTestBase.java: Line 169:* Add a new dummy HDFS or external Kudu table to the catalog based on the given CREATE Remove the "HDFS or external Kudu" part here. You already explain the limitations later (which seems more suitable to me) Line 199: }else if (dummyTable instanceof KuduTable) { space after "}" http://gerrit.cloudera.org:8080/#/c/7560/8/fe/src/test/java/org/apache/impala/planner/PlannerTest.java File fe/src/test/java/org/apache/impala/planner/PlannerTest.java: Line 318: addTestTable("CREATE EXTERNAL TABLE kudu_planner_test.no_stats STORED AS KUDU " + This is pretty awesome, thanks for adding it! -- To view, visit http://gerrit.cloudera.org:8080/7560 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables
Bikramjeet Vig has posted comments on this change. Change subject: IMPALA-5602: Fix query optimization for kudu and datasource tables .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/7560/8/fe/src/main/java/org/apache/impala/catalog/KuduTable.java File fe/src/main/java/org/apache/impala/catalog/KuduTable.java: PS8, Line 444: public void loadSchemaFromKudu() throws ImpalaRuntimeException Extracted relevant code from KuduTable.load() for loading metadata from kudu into a separate public method and using that in FrontendTestBase.addTestTable() to create a temp table in catalog. Please let me know if this change is acceptable for adding support for kudu tables in FrontendTestBase.addTestTable(). If not, I will revert back to the previous patch set. -- To view, visit http://gerrit.cloudera.org:8080/7560 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables
Hello Matthew Jacobs, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7560 to look at the new patch set (#8). Change subject: IMPALA-5602: Fix query optimization for kudu and datasource tables .. IMPALA-5602: Fix query optimization for kudu and datasource tables Fix a bug where the following queries on kudu and datasource tables were incorrectly being optimized as a 'small query' and therefore running on a single node with a single scanner thread: (A) that have all their predicates pushed to the underlying storage layer and have a limit (B) table stats missing + Conditions in (A) Testing: Added frontend planner tests. Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4 --- M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/data-source-tables.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test 10 files changed, 141 insertions(+), 40 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/7560/8 -- To view, visit http://gerrit.cloudera.org:8080/7560 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables
Hello Matthew Jacobs, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7560 to look at the new patch set (#7). Change subject: IMPALA-5602: Fix query optimization for kudu and datasource tables .. IMPALA-5602: Fix query optimization for kudu and datasource tables Fix a bug where the following queries on kudu and datasource tables were incorrectly being optimized as a 'small query' and therefore running on a single node with a single scanner thread: (A) that have all their predicates pushed to the underlying storage layer and have a limit (B) table stats missing + Conditions in (A) Testing: For (A): Added a frontend planner test for kudu tables For (B): Added an end to end planner test for kudu and a query test for datasource tables Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4 --- M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java M testdata/workloads/functional-planner/queries/PlannerTest/data-source-tables.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M tests/query_test/test_kudu.py 8 files changed, 92 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/7560/7 -- To view, visit http://gerrit.cloudera.org:8080/7560 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables
Bikramjeet Vig has posted comments on this change. Change subject: IMPALA-5602: Fix query optimization for kudu and datasource tables .. Patch Set 6: (4 comments) http://gerrit.cloudera.org:8080/#/c/7560/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java File fe/src/main/java/org/apache/impala/planner/ScanNode.java: Line 219: // Derived classes should override this method if they have any pushed conjuncts > Fair point. To me it's not confusing, but I'll defer to MJ. Instead of addi As discussed, using hasScanConjuncts to denote conjuncts being used by impala, and hasStorageLayerConjuncts to denote those pushed down to the storage layer. http://gerrit.cloudera.org:8080/#/c/7560/5/fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java File fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java: Line 57: valid_ = false; > MJ, does this change even make sense? Suppose we have a query with a limit as discussed, this code is correct as we need multiple impala instances to make kudu scans faster http://gerrit.cloudera.org:8080/#/c/7560/5/testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test File testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test: Line 132: QUERY > I don't think that's true. This is what I get: Done http://gerrit.cloudera.org:8080/#/c/7560/5/tests/query_test/test_kudu.py File tests/query_test/test_kudu.py: Line 1010: """IMPALA-5602: Test that 'small query' optimization is not used if table stats are > I understand. My question is whether you have tried extending addTestTable( Still working on a way to make this work. What I have found till now: addTestTable must be able to create a table only in the catalog and not in the actual storage layer. The problem is that when a kuduScan node is created in the planning phase it connects to the kudu client to make sure that the table exists. Now to get around this, if we create an external table, we will still have to copy code out of KuduTable.load() in order to load the columns from the kudu client. KuduTable.load() cannot be used directly as it requires a handle to HMS instance. Either this, or we refactor the code in KuduTable.load() to extract the required code into a separate method and call that. -- To view, visit http://gerrit.cloudera.org:8080/7560 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables
Alex Behm has posted comments on this change. Change subject: IMPALA-5602: Fix query optimization for kudu and datasource tables .. Patch Set 5: (4 comments) http://gerrit.cloudera.org:8080/#/c/7560/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java File fe/src/main/java/org/apache/impala/planner/ScanNode.java: Line 219: public boolean hasPushedConjuncts() { > Makes sense, but as Matt pointed out earlier in a discussion that having si Fair point. To me it's not confusing, but I'll defer to MJ. Instead of adding to the conjunct confusion, I suggest we override getInputCardinality() in every scan node. The ScanNode already overrides the default implementation in PlanNode, so the change would only continue the specialization downwards. I understand that we also use this code in the max rows visitor, but I'm not sure the use makes sense there (see my comment in the other file). http://gerrit.cloudera.org:8080/#/c/7560/5/fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java File fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java: Line 57: !scan.hasPushedConjuncts( { MJ, does this change even make sense? Suppose we have a query with a limit and only Kudu conjuncts. From Impala's point of view the input cardinality is still the limit, so why run the scan on multiple impalads? Will Kudu be faster in scanning if we query it with multiple impalads? (Similar question for the datasource scan) http://gerrit.cloudera.org:8080/#/c/7560/5/testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test File testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test: Line 132: QUERY > This code change has no effect on the plan created since for a data source I don't think that's true. This is what I get: [localhost:21000] > explain select * from alltypes_datasource; Query: explain select * from alltypes_datasource ++ | Explain String | ++ | Max Per-Host Resource Reservation: Memory=0B | | Per-Host Resource Estimates: Memory=1.00GB | | WARNING: The following tables are missing relevant table and/or column statistics. | | functional.alltypes_datasource | | | | PLAN-ROOT SINK | | | | | 01:EXCHANGE [UNPARTITIONED] | | | | | 00:SCAN DATA SOURCE [functional.alltypes_datasource] | ++ With the single-node optimization we should not have an exchange. http://gerrit.cloudera.org:8080/#/c/7560/5/tests/query_test/test_kudu.py File tests/query_test/test_kudu.py: Line 1010: """IMPALA-5602: Test that 'small query' optimization is not used if table stats are > Unfortunately FrontendTestBase.addTestTable() can only be used to add hdfs I understand. My question is whether you have tried extending addTestTable() to work for this use case. Extending our FE unit testing capabilities is preferable to adding a one-off EE test. If the extension is too hard, that's fine, but I'm not yet convinced that it is. -- To view, visit http://gerrit.cloudera.org:8080/7560 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables
Bikramjeet Vig has posted comments on this change. Change subject: IMPALA-5602: Fix query optimization for kudu and datasource tables .. Patch Set 5: (7 comments) http://gerrit.cloudera.org:8080/#/c/7560/5/fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java File fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java: Line 367: public boolean hasPushedConjuncts() { return !acceptedConjuncts_.isEmpty(); } > Do we have the same issue with 'filters_' in the HBaseScanNode? No, for HBaseScanNode, if filters are created they are not removed from the conjuncts member variable http://gerrit.cloudera.org:8080/#/c/7560/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java File fe/src/main/java/org/apache/impala/planner/ScanNode.java: Line 219: public boolean hasPushedConjuncts() { > It seems strange and unnecessary to distinguish the "pushed" from "non-push Makes sense, but as Matt pointed out earlier in a discussion that having similar names like hasConjuncts() or hasScanConjuncts() might make the interface to this class confusing while comparing hasConjuncts() and getConjuncts().isEmpty(). Would it make more sense to have a more specific name like he suggested earlier: getEffectiveScanConjuncts() ? http://gerrit.cloudera.org:8080/#/c/7560/5/fe/src/test/java/org/apache/impala/planner/PlannerTest.java File fe/src/test/java/org/apache/impala/planner/PlannerTest.java: Line 317: TQueryOptions options = defaultQueryOptions(); > What do we need this change for? This adds a lot expected test output. Done http://gerrit.cloudera.org:8080/#/c/7560/5/testdata/workloads/functional-planner/queries/PlannerTest/kudu.test File testdata/workloads/functional-planner/queries/PlannerTest/kudu.test: Line 774: DISTRIBUTEDPLAN > We don't need explain level VERBOSE to test this. Done http://gerrit.cloudera.org:8080/#/c/7560/5/testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test File testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test: Line 132: QUERY > a PlannerTest is more suitable This code change has no effect on the plan created since for a data source a single node is used in the plan regardless of other optimizations. So, to make sure that small query optimization is not used, I had to check the runtime profile instead and make sure query options set by small query optimization dont appear http://gerrit.cloudera.org:8080/#/c/7560/5/tests/query_test/test_kudu.py File tests/query_test/test_kudu.py: Line 1010: """IMPALA-5602: Test that 'small query' optimization is not used if table stats are > Have you tried to see if we can modify FrontendTestBase.addTestTable() to Unfortunately FrontendTestBase.addTestTable() can only be used to add hdfs tables. http://gerrit.cloudera.org:8080/#/c/7560/5/tests/query_test/test_queries.py File tests/query_test/test_queries.py: Line 164: # Reset this exec_option to check default behaviour of any planner optimization tests > Why? This way we lose coverage of the other case. My understanding is that Done -- To view, visit http://gerrit.cloudera.org:8080/7560 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables
Hello Matthew Jacobs, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7560 to look at the new patch set (#6). Change subject: IMPALA-5602: Fix query optimization for kudu and datasource tables .. IMPALA-5602: Fix query optimization for kudu and datasource tables Fix a bug where the following queries on kudu and datasource tables were incorrectly being optimized as a 'small query' and therefore running on a single node with a single scanner thread: (A) that have all their predicates pushed to the underlying storage layer and have a limit (B) table stats missing + Conditions in (A) Testing: For (A): Added a frontend planner test for kudu tables For (B): Added an end to end planner test for kudu and a query test for datasource tables Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4 --- M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test M tests/query_test/test_kudu.py 7 files changed, 70 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/7560/6 -- To view, visit http://gerrit.cloudera.org:8080/7560 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables
Alex Behm has posted comments on this change. Change subject: IMPALA-5602: Fix query optimization for kudu and datasource tables .. Patch Set 5: (7 comments) http://gerrit.cloudera.org:8080/#/c/7560/5/fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java File fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java: Line 367: public boolean hasPushedConjuncts() { return !acceptedConjuncts_.isEmpty(); } Do we have the same issue with 'filters_' in the HBaseScanNode? http://gerrit.cloudera.org:8080/#/c/7560/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java File fe/src/main/java/org/apache/impala/planner/ScanNode.java: Line 219: public boolean hasPushedConjuncts() { It seems strange and unnecessary to distinguish the "pushed" from "non-pushed" conjuncts, especially since the concept is only relevant for certain scans. I'd prefer a function hasConjuncts() or hasScanConjuncts() or similar. Alternatively, each ScanNode subclass could directly override getInputCardinality(). http://gerrit.cloudera.org:8080/#/c/7560/5/fe/src/test/java/org/apache/impala/planner/PlannerTest.java File fe/src/test/java/org/apache/impala/planner/PlannerTest.java: Line 317: TQueryOptions options = defaultQueryOptions(); What do we need this change for? This adds a lot expected test output. http://gerrit.cloudera.org:8080/#/c/7560/5/testdata/workloads/functional-planner/queries/PlannerTest/kudu.test File testdata/workloads/functional-planner/queries/PlannerTest/kudu.test: Line 774: DISTRIBUTEDPLAN We don't need explain level VERBOSE to test this. http://gerrit.cloudera.org:8080/#/c/7560/5/testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test File testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test: Line 132: QUERY a PlannerTest is more suitable http://gerrit.cloudera.org:8080/#/c/7560/5/tests/query_test/test_kudu.py File tests/query_test/test_kudu.py: Line 1010: """IMPALA-5602: Test that 'small query' optimization is not used if table stats are Have you tried to see if we can modify FrontendTestBase.addTestTable() to serve this use case? It's unfortunate if we have to split up the tests like this. http://gerrit.cloudera.org:8080/#/c/7560/5/tests/query_test/test_queries.py File tests/query_test/test_queries.py: Line 164: # Reset this exec_option to check default behaviour of any planner optimization tests Why? This way we lose coverage of the other case. My understanding is that in exhaustive we run with and without this option which, so I think we should leave this as is. -- To view, visit http://gerrit.cloudera.org:8080/7560 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5602: Fix query optimization for kudu and datasource tables .. Patch Set 5: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/7560/4/tests/query_test/test_queries.py File tests/query_test/test_queries.py: Line 166: del vector.get_value('exec_option')['exec_single_node_rows_threshold'] > I tried running all tests in this class to check for this and it seems the Hmm maybe it was fixed or something. -- To view, visit http://gerrit.cloudera.org:8080/7560 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables
Hello Matthew Jacobs, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7560 to look at the new patch set (#5). Change subject: IMPALA-5602: Fix query optimization for kudu and datasource tables .. IMPALA-5602: Fix query optimization for kudu and datasource tables Fix a bug where the following queries on kudu and datasource tables were incorrectly being optimized as a 'small query' and therefore running on a single node with a single scanner thread: (A) that have all their predicates pushed to the underlying storage layer and have a limit (B) table stats missing + Conditions in (A) Testing: For (A): Added a frontend planner test for kudu tables For (B): Added an end to end planner test for kudu and a query test for datasource tables Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4 --- M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test M tests/query_test/test_kudu.py M tests/query_test/test_queries.py 9 files changed, 687 insertions(+), 262 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/7560/5 -- To view, visit http://gerrit.cloudera.org:8080/7560 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables
Bikramjeet Vig has posted comments on this change. Change subject: IMPALA-5602: Fix query optimization for kudu and datasource tables .. Patch Set 3: (13 comments) http://gerrit.cloudera.org:8080/#/c/7560/4/fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java File fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java: Line 367 > formatting Done http://gerrit.cloudera.org:8080/#/c/7560/3/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java File fe/src/main/java/org/apache/impala/planner/KuduScanNode.java: Line 527: return !kuduConjuncts_.isEmpty(); > nit oneline Done http://gerrit.cloudera.org:8080/#/c/7560/3/fe/src/main/java/org/apache/impala/planner/ScanNode.java File fe/src/main/java/org/apache/impala/planner/ScanNode.java: PS3, Line 219: (){ > nit: missing space Done PS3, Line 219: (){ > You are correct, that should be fixed as well. Done Line 219: public boolean hasPushedConjuncts(){ > I think DataSourceScanNode can also push conjuncts. Done http://gerrit.cloudera.org:8080/#/c/7560/4/testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test File testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test: PS4, Line 138: > can you put the full key here, i.e. + "(non default)" or whatever? That way adding "(non default)" for now. Will switch to whatever we decide to add in IMPALA-5784 for representing changes made by planner. http://gerrit.cloudera.org:8080/#/c/7560/3/tests/query_test/test_kudu.py File tests/query_test/test_kudu.py: Line 1007: def test_missing_table_stats(self, cursor, kudu_client, unique_database): > This needs @SkipIfNotHdfsMinicluster.plans since it's assuming a 3 node min Done PS3, Line 1019: self.client.execute("set explain_level=3") > you can actually use client.set_configuration(config_map) Done PS3, Line 1024: assert str(result).find("hosts=3 instances=3") != -1 > the more python-y way to say this is Done http://gerrit.cloudera.org:8080/#/c/7560/4/tests/query_test/test_kudu.py File tests/query_test/test_kudu.py: PS4, Line 1005: > creating a table. Done PS4, Line 1010: k > JIRA number: Done http://gerrit.cloudera.org:8080/#/c/7560/4/tests/query_test/test_queries.py File tests/query_test/test_queries.py: Line 165: > nit: missing . Done Line 166: def test_distinct_estimate(self, vector): > I think this is modifying the vector in-place. I.e. the change can leak out I tried running all tests in this class to check for this and it seems the changes to the vector do not carry on to other tests. -- To view, visit http://gerrit.cloudera.org:8080/7560 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5602: Fix query optimization for kudu and datasource tables .. Patch Set 4: Code-Review+1 (2 comments) http://gerrit.cloudera.org:8080/#/c/7560/4/tests/query_test/test_queries.py File tests/query_test/test_queries.py: Line 165: # that rely on it nit: missing . Line 166: del vector.get_value('exec_option')['exec_single_node_rows_threshold'] I think this is modifying the vector in-place. I.e. the change can leak out to other places. I'm not a py.test expert but we definitely copy the vector in other places to avoid this problem. -- To view, visit http://gerrit.cloudera.org:8080/7560 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-5602: Fix query optimization for kudu and datasource tables .. Patch Set 4: also it looks like this needs a rebase -- To view, visit http://gerrit.cloudera.org:8080/7560 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-5602: Fix query optimization for kudu and datasource tables .. Patch Set 4: Code-Review+1 (4 comments) Let's see if Alex and or Tim want to take another look http://gerrit.cloudera.org:8080/#/c/7560/4/fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java File fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java: Line 367: public boolean hasPushedConjuncts(){ return !acceptedConjuncts_.isEmpty(); } formatting http://gerrit.cloudera.org:8080/#/c/7560/4/testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test File testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test: PS4, Line 138: s( can you put the full key here, i.e. + "(non default)" or whatever? That way we'll be sure to update it after IMPALA-5784 and then I think this will become a more convincing test case because we know the planner was the one that set this option. http://gerrit.cloudera.org:8080/#/c/7560/4/tests/query_test/test_kudu.py File tests/query_test/test_kudu.py: PS4, Line 1005: creating table creating a table. PS4, Line 1010: "T JIRA number: -- To view, visit http://gerrit.cloudera.org:8080/7560 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables
Bikramjeet Vig has uploaded a new patch set (#4). Change subject: IMPALA-5602: Fix query optimization for kudu and datasource tables .. IMPALA-5602: Fix query optimization for kudu and datasource tables Fix a bug where the following queries on kudu and datasource tables were incorrectly being optimized as a 'small query' and therefore running on a single node with a single scanner thread: (A) that have all their predicates pushed to the underlying storage layer and have a limit (B) table stats missing + Conditions in (A) Testing: For (A): Added a frontend planner test for kudu tables For (B): Added an end to end planner test for kudu and a query test for datasource tables Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4 --- M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test M tests/query_test/test_kudu.py M tests/query_test/test_queries.py 9 files changed, 673 insertions(+), 256 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/7560/4 -- To view, visit http://gerrit.cloudera.org:8080/7560 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong