Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/20498 )
Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction ...................................................................... Patch Set 13: (21 comments) http://gerrit.cloudera.org:8080/#/c/20498/12//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20498/12//COMMIT_MSG@31 PS12, Line 31: mode > Nit: modes. Done http://gerrit.cloudera.org:8080/#/c/20498/12//COMMIT_MSG@35 PS12, Line 35: RUNTIME_FILTER_CARDINALITY_REDUCTION_SCALE > Could you elaborate on how it can be used (what values are valid, what they Done http://gerrit.cloudera.org:8080/#/c/20498/12/be/src/service/query-options-test.cc File be/src/service/query-options-test.cc: http://gerrit.cloudera.org:8080/#/c/20498/12/be/src/service/query-options-test.cc@315 PS12, Line 315: // List of pairs of Key and boolean flag on whether the option is inclusive of 0 and 1. > I'd put it right before the loop. Done http://gerrit.cloudera.org:8080/#/c/20498/12/be/src/service/query-options-test.cc@316 PS12, Line 316: pair<OptionDef<double>, bool> case_set[]{ > This is not used. Done http://gerrit.cloudera.org:8080/#/c/20498/12/be/src/service/query-options.cc File be/src/service/query-options.cc: http://gerrit.cloudera.org:8080/#/c/20498/12/be/src/service/query-options.cc@1170 PS12, Line 1170: ; > No need for the 'float' literal, especially as it is a double. Done http://gerrit.cloudera.org:8080/#/c/20498/12/common/thrift/ImpalaService.thrift File common/thrift/ImpalaService.thrift: http://gerrit.cloudera.org:8080/#/c/20498/12/common/thrift/ImpalaService.thrift@837 PS12, Line 837: exceed > Nit: exceeds. Done http://gerrit.cloudera.org:8080/#/c/20498/12/common/thrift/ImpalaService.thrift@849 PS12, Line 849: Default > Nit: Defaults. Done http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java File fe/src/main/java/org/apache/impala/planner/ScanNode.java: http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@474 PS12, Line 474: only applies at HDFS columnar file > Now 'isAllColumnarScanner' is replaced with 'evalAtRowLevel', Kudu is also Yes. Mentioned Kudu in comment. http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@485 PS12, Line 485: reduc > Maybe 'reduced' would be better, "lower o. c. estimate" may suggest a "lowe Done http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@486 PS12, Line 486: at > Nit: is? Done http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@488 PS12, Line 488: below > It can be a bit confusing that this join node is at the bottom of the stack Added "in node tree". http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@514 PS12, Line 514: this, reducedCardinality, partitionSelectivities); > I think that it would be more readable if the the body of the loop would be Added RuntimeFilter.reducedCardinalityForScanNode(). http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@531 PS12, Line 531: dinality, scaledPartSel); : } > I am not sure if this is correct - all max selectivities per column will be Done http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@561 PS12, Line 561: > Can it be multiple join nodes or just one? There can be multiple join nodes in nodeStack and they connect with each other at probe pipeline. http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@565 PS12, Line 565: > Could you also mention 'reductionScale' here? Done http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@578 PS12, Line 578: dinality_ * scanRangeSelectivity_ > Could extract this as 'currentCardinality'. Done http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@591 PS12, Line 591: // with the least > Shouldn't it be 'partitionSelectivity'? Or is it the same thing? It is the same thing. I name it scanRangeSelectivity_ because from ScanNode perspective, it will be applied to reduce the estimated number of scan range that survive file-level filter and actually being read. Scan range might be 1 range per file or multiple range per file, depending on target file system. http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@609 PS12, Line 609: > Why do we leave out the last one? Clarified in the new comment below this. http://gerrit.cloudera.org:8080/#/c/20498/12/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test File testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test: http://gerrit.cloudera.org:8080/#/c/20498/12/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test@11 PS12, Line 11: ss_sold_date_sk = d_date_sk > Can you also add a test when there is an expression in the key to test rele Added as Case 07 and Case 08 in patch set 13. http://gerrit.cloudera.org:8080/#/c/20498/12/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test@91 PS12, Line 91: _ticket_number is n > Is it possible to see the effect of runtime filters on partition selectivit Added scan-range-selectivity next to max-scan-range-rows in query plan. http://gerrit.cloudera.org:8080/#/c/20498/12/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test@299 PS12, Line 299: itio > Can you also add a test for semi left join? Left anti join is a variant of left semi join, per this code in JoinOperator.java public boolean isLeftSemiJoin() { return this == JoinOperator.LEFT_SEMI_JOIN || this == JoinOperator.LEFT_ANTI_JOIN || this == JoinOperator.NULL_AWARE_LEFT_ANTI_JOIN || this == JoinOperator.ICEBERG_DELETE_JOIN; } -- To view, visit http://gerrit.cloudera.org:8080/20498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 Gerrit-Change-Number: 20498 Gerrit-PatchSet: 13 Gerrit-Owner: Riza Suminto <[email protected]> Gerrit-Reviewer: Abhishek Rawat <[email protected]> Gerrit-Reviewer: Aman Sinha <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: David Rorke <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Comment-Date: Mon, 27 Nov 2023 22:55:11 +0000 Gerrit-HasComments: Yes
