[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/20498 ) Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. IMPALA-12018: Consider runtime filter for cardinality reduction Currently, Impala creates a plan first and looks for runtime filters based on the complete plan. This means the cardinality estimate in the query plan does not incorporate runtime filter selectivity. Actual scan cardinality from runtime execution is often much lower than the cardinality estimate due to the existence of runtime filters. This patch applies runtime filter selectivity to lower cardinality estimates of scan nodes and certain join nodes above them after runtime filter generation and before resource requirement computation. The algorithm selects a contiguous probe pipeline consisting of a scan node, exchanges, and reducing join nodes. Depending on whether the join node produces a runtime filter and the type of that runtime filter, it then applies the runtime filter selectivity to the scan node to reduce its cardinality and input cardinality estimate. The runtime filter selectivity is calculated with the simplest join cardinality formula (JoinNode.computeGenericJoinCardinality()). The reduced cardinality is stored in new fields 'filteredCardinality_' and 'filteredInputCardinality_', separate from existing fields 'cardinality_' and 'inputCardinality_'. Future work should merge the new cardinality fields with the old cardinality fields after we can validate that the cardinality reduction does not regress memory estimation. While this cardinality reduction is present in all execution modes (MT_DOP=0, MT_DOP>0, and COMPUTE_PROCESSING_COST=1), cost-based planning mode will be the primary beneficiary of this patch. It can lead toward ProcessingCost reduction, lower scan fragment parallelism, lower CpuAsk, and increase the chance of query assignment to the smaller executor group set. Other execution modes will see no change in their execution parallelism or memory estimates. This patch also adds development query option named RUNTIME_FILTER_CARDINALITY_REDUCTION_SCALE, a range of [0.0..1.0] that controls the cardinality reduction scale from runtime filter analysis to help with benchmarking and disabling cardinality reduction if needed (by setting to 0.0). Default to 1.0. Testing: - Add fe test testRuntimeFilterCardinalityReduction and testRuntimeFilterCardinalityReductionOnKudu - Pass test_executor_groups.py. - Pass PlannerTest#testProcessingCost. - Add be test QueryOptions.SetFractionalOptions - Ran full TPC-DS 3TB benchmark and see no regression due to query plan change. - Pass core tests. Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 Reviewed-on: http://gerrit.cloudera.org:8080/20498 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/DataStreamSink.java M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java M fe/src/main/java/org/apache/impala/planner/JoinNode.java M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test M testdata/workloads/functional-planner/queries/PlannerTest/convert-to-cnf.test M testdata/workloads/functional-planner/queries/PlannerTest/explain-verbose-mt_dop.test M testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test M testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test M testdata/workloads/functional-planner/queries/PlannerTest/outer-to-inner-joins.test M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test M
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20498 ) Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. Patch Set 20: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/20498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 Gerrit-Change-Number: 20498 Gerrit-PatchSet: 20 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Tue, 19 Dec 2023 04:27:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/20498 ) Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. Patch Set 19: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/20498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 Gerrit-Change-Number: 20498 Gerrit-PatchSet: 19 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Mon, 18 Dec 2023 23:43:54 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Hello Aman Sinha, Daniel Becker, Abhishek Rawat, David Rorke, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20498 to look at the new patch set (#19). Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. IMPALA-12018: Consider runtime filter for cardinality reduction Currently, Impala creates a plan first and looks for runtime filters based on the complete plan. This means the cardinality estimate in the query plan does not incorporate runtime filter selectivity. Actual scan cardinality from runtime execution is often much lower than the cardinality estimate due to the existence of runtime filters. This patch applies runtime filter selectivity to lower cardinality estimates of scan nodes and certain join nodes above them after runtime filter generation and before resource requirement computation. The algorithm selects a contiguous probe pipeline consisting of a scan node, exchanges, and reducing join nodes. Depending on whether the join node produces a runtime filter and the type of that runtime filter, it then applies the runtime filter selectivity to the scan node to reduce its cardinality and input cardinality estimate. The runtime filter selectivity is calculated with the simplest join cardinality formula (JoinNode.computeGenericJoinCardinality()). The reduced cardinality is stored in new fields 'filteredCardinality_' and 'filteredInputCardinality_', separate from existing fields 'cardinality_' and 'inputCardinality_'. Future work should merge the new cardinality fields with the old cardinality fields after we can validate that the cardinality reduction does not regress memory estimation. While this cardinality reduction is present in all execution modes (MT_DOP=0, MT_DOP>0, and COMPUTE_PROCESSING_COST=1), cost-based planning mode will be the primary beneficiary of this patch. It can lead toward ProcessingCost reduction, lower scan fragment parallelism, lower CpuAsk, and increase the chance of query assignment to the smaller executor group set. Other execution modes will see no change in their execution parallelism or memory estimates. This patch also adds development query option named RUNTIME_FILTER_CARDINALITY_REDUCTION_SCALE, a range of [0.0..1.0] that controls the cardinality reduction scale from runtime filter analysis to help with benchmarking and disabling cardinality reduction if needed (by setting to 0.0). Default to 1.0. Testing: - Add fe test testRuntimeFilterCardinalityReduction and testRuntimeFilterCardinalityReductionOnKudu - Pass test_executor_groups.py. - Pass PlannerTest#testProcessingCost. - Add be test QueryOptions.SetFractionalOptions - Ran full TPC-DS 3TB benchmark and see no regression due to query plan change. - Pass core tests. Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 --- M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/DataStreamSink.java M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java M fe/src/main/java/org/apache/impala/planner/JoinNode.java M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test M testdata/workloads/functional-planner/queries/PlannerTest/convert-to-cnf.test M testdata/workloads/functional-planner/queries/PlannerTest/explain-verbose-mt_dop.test M testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test M testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test M testdata/workloads/functional-planner/queries/PlannerTest/outer-to-inner-joins.test M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test M
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20498 ) Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. Patch Set 20: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10073/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/20498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 Gerrit-Change-Number: 20498 Gerrit-PatchSet: 20 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Mon, 18 Dec 2023 23:46:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20498 ) Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. Patch Set 20: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/20498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 Gerrit-Change-Number: 20498 Gerrit-PatchSet: 20 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Mon, 18 Dec 2023 23:46:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20498 ) Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. Patch Set 18: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/14755/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 Gerrit-Change-Number: 20498 Gerrit-PatchSet: 18 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Sat, 16 Dec 2023 04:49:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/20498 ) Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. Patch Set 18: ps18 is rebase to resolve merge conflict. -- To view, visit http://gerrit.cloudera.org:8080/20498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 Gerrit-Change-Number: 20498 Gerrit-PatchSet: 18 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Sat, 16 Dec 2023 04:28:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Hello Aman Sinha, Daniel Becker, Abhishek Rawat, David Rorke, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20498 to look at the new patch set (#18). Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. IMPALA-12018: Consider runtime filter for cardinality reduction Currently, Impala creates a plan first and looks for runtime filters based on the complete plan. This means the cardinality estimate in the query plan does not incorporate runtime filter selectivity. Actual scan cardinality from runtime execution is often much lower than the cardinality estimate due to the existence of runtime filters. This patch applies runtime filter selectivity to lower cardinality estimates of scan nodes and certain join nodes above them after runtime filter generation and before resource requirement computation. The algorithm selects a contiguous probe pipeline consisting of a scan node, exchanges, and reducing join nodes. Depending on whether the join node produces a runtime filter and the type of that runtime filter, it then applies the runtime filter selectivity to the scan node to reduce its cardinality and input cardinality estimate. The runtime filter selectivity is calculated with the simplest join cardinality formula (JoinNode.computeGenericJoinCardinality()). The reduced cardinality is stored in new fields 'filteredCardinality_' and 'filteredInputCardinality_', separate from existing fields 'cardinality_' and 'inputCardinality_'. Future work should merge the new cardinality fields with the old cardinality fields after we can validate that the cardinality reduction does not regress memory estimation. While this cardinality reduction is present in all execution modes (MT_DOP=0, MT_DOP>0, and COMPUTE_PROCESSING_COST=1), cost-based planning mode will be the primary beneficiary of this patch. It can lead toward ProcessingCost reduction, lower scan fragment parallelism, lower CpuAsk, and increase the chance of query assignment to the smaller executor group set. Other execution modes will see no change in their execution parallelism or memory estimates. This patch also adds development query option named RUNTIME_FILTER_CARDINALITY_REDUCTION_SCALE, a range of [0.0..1.0] that controls the cardinality reduction scale from runtime filter analysis to help with benchmarking and disabling cardinality reduction if needed (by setting to 0.0). Default to 1.0. Testing: - Add fe test testRuntimeFilterCardinalityReduction and testRuntimeFilterCardinalityReductionOnKudu - Pass test_executor_groups.py. - Pass PlannerTest#testProcessingCost. - Add be test QueryOptions.SetFractionalOptions - Pass core tests. Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 --- M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/DataStreamSink.java M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java M fe/src/main/java/org/apache/impala/planner/JoinNode.java M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test M testdata/workloads/functional-planner/queries/PlannerTest/convert-to-cnf.test M testdata/workloads/functional-planner/queries/PlannerTest/explain-verbose-mt_dop.test M testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test M testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test M testdata/workloads/functional-planner/queries/PlannerTest/outer-to-inner-joins.test M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test A
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20498 ) Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. Patch Set 17: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/14744/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 Gerrit-Change-Number: 20498 Gerrit-PatchSet: 17 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 15 Dec 2023 18:36:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/20498 ) Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. Patch Set 17: (2 comments) http://gerrit.cloudera.org:8080/#/c/20498/16//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20498/16//COMMIT_MSG@12 PS16, Line 12: than > nit: than Done http://gerrit.cloudera.org:8080/#/c/20498/16/fe/src/main/java/org/apache/impala/planner/ExchangeNode.java File fe/src/main/java/org/apache/impala/planner/ExchangeNode.java: http://gerrit.cloudera.org:8080/#/c/20498/16/fe/src/main/java/org/apache/impala/planner/ExchangeNode.java@306 PS16, Line 306: getCardinality() > I found a potential issue in resource estimation of EXCHANGE node. ps17 store the reduced cardinality into separate filed instead of replacing cardinality_ and inputCardinality_ filed. Therefore, memory estimation remain the same before and after patch, even if this feature is enabled by default. -- To view, visit http://gerrit.cloudera.org:8080/20498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 Gerrit-Change-Number: 20498 Gerrit-PatchSet: 17 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 15 Dec 2023 18:14:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Hello Aman Sinha, Daniel Becker, Abhishek Rawat, David Rorke, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20498 to look at the new patch set (#17). Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. IMPALA-12018: Consider runtime filter for cardinality reduction Currently, Impala creates a plan first and looks for runtime filters based on the complete plan. This means the cardinality estimate in the query plan does not incorporate runtime filter selectivity. Actual scan cardinality from runtime execution is often much lower than the cardinality estimate due to the existence of runtime filters. This patch applies runtime filter selectivity to lower cardinality estimates of scan nodes and certain join nodes above them after runtime filter generation and before resource requirement computation. The algorithm selects a contiguous probe pipeline consisting of a scan node, exchanges, and reducing join nodes. Depending on whether the join node produces a runtime filter and the type of that runtime filter, it then applies the runtime filter selectivity to the scan node to reduce its cardinality and input cardinality estimate. The runtime filter selectivity is calculated with the simplest join cardinality formula (JoinNode.computeGenericJoinCardinality()). The reduced cardinality is stored in new fields 'filteredCardinality_' and 'filteredInputCardinality_', separate from existing fields 'cardinality_' and 'inputCardinality_'. Future work should merge the new cardinality fields with the old cardinality fields after we can validate that the cardinality reduction does not regress memory estimation. While this cardinality reduction is present in all execution modes (MT_DOP=0, MT_DOP>0, and COMPUTE_PROCESSING_COST=1), cost-based planning mode will be the primary beneficiary of this patch. It can lead toward ProcessingCost reduction, lower scan fragment parallelism, lower CpuAsk, and increase the chance of query assignment to the smaller executor group set. Other execution modes will see no change in their execution parallelism or memory estimates. This patch also adds development query option named RUNTIME_FILTER_CARDINALITY_REDUCTION_SCALE, a range of [0.0..1.0] that controls the cardinality reduction scale from runtime filter analysis to help with benchmarking and disabling cardinality reduction if needed (by setting to 0.0). Default to 1.0. Testing: - Add fe test testRuntimeFilterCardinalityReduction and testRuntimeFilterCardinalityReductionOnKudu - Pass test_executor_groups.py. - Pass PlannerTest#testProcessingCost. - Add be test QueryOptions.SetFractionalOptions - Pass core tests. Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 --- M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/DataStreamSink.java M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java M fe/src/main/java/org/apache/impala/planner/JoinNode.java M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test M testdata/workloads/functional-planner/queries/PlannerTest/convert-to-cnf.test M testdata/workloads/functional-planner/queries/PlannerTest/explain-verbose-mt_dop.test M testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test M testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test M testdata/workloads/functional-planner/queries/PlannerTest/outer-to-inner-joins.test M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test A
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/20498 ) Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. Patch Set 16: (2 comments) http://gerrit.cloudera.org:8080/#/c/20498/16//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20498/16//COMMIT_MSG@12 PS16, Line 12: that nit: than http://gerrit.cloudera.org:8080/#/c/20498/16/fe/src/main/java/org/apache/impala/planner/ExchangeNode.java File fe/src/main/java/org/apache/impala/planner/ExchangeNode.java: http://gerrit.cloudera.org:8080/#/c/20498/16/fe/src/main/java/org/apache/impala/planner/ExchangeNode.java@306 PS16, Line 306: getCardinality() I found a potential issue in resource estimation of EXCHANGE node. If getCardinality() is abnormally low here (and in estimateDeferredRPCQueueSize()) after runtime filter reduction, estimatedTotalQueueByteSize may be severely underestimate. SCAN and JOIN node is not impacted. SCAN node estimate is based on scan range count before runtime filter reduction. JOIN node estimate use cardinality, but of the build side. This patch operate on the probe pipeline, so no impact there for JOIN memory estimate. I think this patch should be modified to focus on using reduced cardinality for ProcessingCost only. Resource estimation should keep using original cardinality before runtime filter so that memory estimate stay conservative. -- To view, visit http://gerrit.cloudera.org:8080/20498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 Gerrit-Change-Number: 20498 Gerrit-PatchSet: 16 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 14 Dec 2023 23:14:45 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20498 ) Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. Patch Set 16: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/14651/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 Gerrit-Change-Number: 20498 Gerrit-PatchSet: 16 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Tue, 12 Dec 2023 01:17:36 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/20498 ) Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. Patch Set 16: (1 comment) ps16 is a rebase. http://gerrit.cloudera.org:8080/#/c/20498/15/fe/src/main/java/org/apache/impala/planner/ScanNode.java File fe/src/main/java/org/apache/impala/planner/ScanNode.java: http://gerrit.cloudera.org:8080/#/c/20498/15/fe/src/main/java/org/apache/impala/planner/ScanNode.java@100 PS15, Line 100: consume > Nit: comsumes Done -- To view, visit http://gerrit.cloudera.org:8080/20498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 Gerrit-Change-Number: 20498 Gerrit-PatchSet: 16 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Tue, 12 Dec 2023 00:51:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Hello Aman Sinha, Daniel Becker, Abhishek Rawat, David Rorke, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20498 to look at the new patch set (#16). Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. IMPALA-12018: Consider runtime filter for cardinality reduction Currently Impala creates a plan first and looks for runtime filters based on the complete plan. This means cardinality estimate in the query plan does not incorporate runtime filter selectivity. Actual scan cardinality from runtime execution is often much lower that the cardinality estimate due to existence of runtime filter. This patch applies runtime filter selectivity to lower cardinality estimates of scan nodes and certain join nodes above them after runtime filter generation and before resource requirement computation. The algorithm selects a contiguous probe pipeline consisting of a scan node, exchanges, and reducing join nodes. Depending on whether the join node produces a runtime filter and the type of that runtime filter, it then applies the runtime filter selectivity to the scan node to reduce its cardinality and input cardinality estimate. The runtime filter selectivity is calculated with the simplest join cardinality formula (JoinNode.computeGenericJoinCardinality()). While this cardinality reduction is present in all execution modes (MT_DOP=0, MT_DOP>0, and COMPUTE_PROCESSING_COST=1), cost-based planning mode will be the primary beneficiary of this patch. It can lead towards ProcessingCost reduction, lower scan fragment parallelism, and increase chance of query assignment to the smaller executor group set. Other execution modes will see no change in their execution parallelism, but might see lower resource estimate. This patch also adds development query option named RUNTIME_FILTER_CARDINALITY_REDUCTION_SCALE, a range of [0.0..1.0] that controls the cardinality reduction scale from runtime filter analysis to help with benchmarking and disabling cardinality reduction if needed (by setting to 0.0). Default to 1.0. Testing: - Add fe test testRuntimeFilterCardinalityReduction and testRuntimeFilterCardinalityReductionOnKudu - Pass test_executor_groups.py. - Pass PlannerTest#testProcessingCost. - Add be test QueryOptions.SetFractionalOptions - Pass core tests. Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 --- M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/JoinNode.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test M testdata/workloads/functional-planner/queries/PlannerTest/convert-to-cnf.test M testdata/workloads/functional-planner/queries/PlannerTest/explain-verbose-mt_dop.test M testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test M testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test M testdata/workloads/functional-planner/queries/PlannerTest/outer-to-inner-joins.test M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test A testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction-on-kudu.test A testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test M testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-dist-method.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q01.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q03.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test M
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/20498 ) Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. Patch Set 15: Code-Review+1 (1 comment) Just a nit, otherwise LGTM. http://gerrit.cloudera.org:8080/#/c/20498/15/fe/src/main/java/org/apache/impala/planner/ScanNode.java File fe/src/main/java/org/apache/impala/planner/ScanNode.java: http://gerrit.cloudera.org:8080/#/c/20498/15/fe/src/main/java/org/apache/impala/planner/ScanNode.java@100 PS15, Line 100: consume Nit: comsumes -- To view, visit http://gerrit.cloudera.org:8080/20498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 Gerrit-Change-Number: 20498 Gerrit-PatchSet: 15 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Mon, 11 Dec 2023 14:28:08 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20498 ) Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. Patch Set 15: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/14605/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 Gerrit-Change-Number: 20498 Gerrit-PatchSet: 15 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 06 Dec 2023 20:58:29 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/20498 ) Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. Patch Set 15: (4 comments) http://gerrit.cloudera.org:8080/#/c/20498/14/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/20498/14/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@2003 PS14, Line 2003: estScanRangeAfterRuntimeFilter(), getEffectiveNumScanRanges())); > Spelling this out to number of scan range might be better. That way, user c Done http://gerrit.cloudera.org:8080/#/c/20498/14/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java: http://gerrit.cloudera.org:8080/#/c/20498/14/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@812 PS14, Line 812: its > Nit: its. Done http://gerrit.cloudera.org:8080/#/c/20498/14/fe/src/main/java/org/apache/impala/planner/ScanNode.java File fe/src/main/java/org/apache/impala/planner/ScanNode.java: http://gerrit.cloudera.org:8080/#/c/20498/14/fe/src/main/java/org/apache/impala/planner/ScanNode.java@569 PS14, Line 569: long scanCardinalityA > Learning from IMPALA-12510, think this should be capped to estimate that at Done http://gerrit.cloudera.org:8080/#/c/20498/14/fe/src/main/java/org/apache/impala/planner/ScanNode.java@578 PS14, Line 578: > This should be ceil. Done -- To view, visit http://gerrit.cloudera.org:8080/20498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 Gerrit-Change-Number: 20498 Gerrit-PatchSet: 15 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 06 Dec 2023 20:32:51 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Hello Aman Sinha, Daniel Becker, Abhishek Rawat, David Rorke, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20498 to look at the new patch set (#15). Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. IMPALA-12018: Consider runtime filter for cardinality reduction Currently Impala creates a plan first and looks for runtime filters based on the complete plan. This means cardinality estimate in the query plan does not incorporate runtime filter selectivity. Actual scan cardinality from runtime execution is often much lower that the cardinality estimate due to existence of runtime filter. This patch applies runtime filter selectivity to lower cardinality estimates of scan nodes and certain join nodes above them after runtime filter generation and before resource requirement computation. The algorithm selects a contiguous probe pipeline consisting of a scan node, exchanges, and reducing join nodes. Depending on whether the join node produces a runtime filter and the type of that runtime filter, it then applies the runtime filter selectivity to the scan node to reduce its cardinality and input cardinality estimate. The runtime filter selectivity is calculated with the simplest join cardinality formula (JoinNode.computeGenericJoinCardinality()). While this cardinality reduction is present in all execution modes (MT_DOP=0, MT_DOP>0, and COMPUTE_PROCESSING_COST=1), cost-based planning mode will be the primary beneficiary of this patch. It can lead towards ProcessingCost reduction, lower scan fragment parallelism, and increase chance of query assignment to the smaller executor group set. Other execution modes will see no change in their execution parallelism, but might see lower resource estimate. This patch also adds development query option named RUNTIME_FILTER_CARDINALITY_REDUCTION_SCALE, a range of [0.0..1.0] that controls the cardinality reduction scale from runtime filter analysis to help with benchmarking and disabling cardinality reduction if needed (by setting to 0.0). Default to 1.0. Testing: - Add fe test testRuntimeFilterCardinalityReduction and testRuntimeFilterCardinalityReductionOnKudu - Pass test_executor_groups.py. - Pass PlannerTest#testProcessingCost. - Add be test QueryOptions.SetFractionalOptions - Pass core tests. Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 --- M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/JoinNode.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test M testdata/workloads/functional-planner/queries/PlannerTest/convert-to-cnf.test M testdata/workloads/functional-planner/queries/PlannerTest/explain-verbose-mt_dop.test M testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test M testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test M testdata/workloads/functional-planner/queries/PlannerTest/outer-to-inner-joins.test M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test A testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction-on-kudu.test A testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test M testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-dist-method.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q01.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q03.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test M
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/20498 ) Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. Patch Set 14: (3 comments) http://gerrit.cloudera.org:8080/#/c/20498/14/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: http://gerrit.cloudera.org:8080/#/c/20498/14/fe/src/main/java/org/apache/impala/planner/Planner.java@498 PS14, Line 498: planCtx > Is there a reason you pass a PlannerContext instead of the reduction scale I'd like to follow precedent set by Planner.computeProcessingCost() and Planner.computeResourceReqs(). They all have PlannerContext as param and unpack query options that they need inside the method. Reading Frontend.createExecRequest() is also cleaner this way. http://gerrit.cloudera.org:8080/#/c/20498/14/fe/src/main/java/org/apache/impala/planner/ScanNode.java File fe/src/main/java/org/apache/impala/planner/ScanNode.java: http://gerrit.cloudera.org:8080/#/c/20498/14/fe/src/main/java/org/apache/impala/planner/ScanNode.java@569 PS14, Line 569: scanRangeSelectivity_ Learning from IMPALA-12510, think this should be capped to estimate that at least 1 scan range read is possible after filtering. http://gerrit.cloudera.org:8080/#/c/20498/14/fe/src/main/java/org/apache/impala/planner/ScanNode.java@578 PS14, Line 578: Math.round This should be ceil. -- To view, visit http://gerrit.cloudera.org:8080/20498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 Gerrit-Change-Number: 20498 Gerrit-PatchSet: 14 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 06 Dec 2023 19:52:34 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/20498 ) Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. Patch Set 14: (3 comments) http://gerrit.cloudera.org:8080/#/c/20498/12/be/src/service/query-options-test.cc File be/src/service/query-options-test.cc: http://gerrit.cloudera.org:8080/#/c/20498/12/be/src/service/query-options-test.cc@315 PS12, Line 315: TQueryOptions options; > The MAKE_OPTIONDEF macro error when I move options later. Ok, then MAKE_OPTIONDEF somehow uses 'options'. http://gerrit.cloudera.org:8080/#/c/20498/14/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: http://gerrit.cloudera.org:8080/#/c/20498/14/fe/src/main/java/org/apache/impala/planner/Planner.java@498 PS14, Line 498: planCtx Is there a reason you pass a PlannerContext instead of the reduction scale as before? If only the reduction scale is needed, I think it's cleaner to only pass that. http://gerrit.cloudera.org:8080/#/c/20498/14/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java: http://gerrit.cloudera.org:8080/#/c/20498/14/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@812 PS14, Line 812: it's Nit: its. -- To view, visit http://gerrit.cloudera.org:8080/20498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 Gerrit-Change-Number: 20498 Gerrit-PatchSet: 14 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 06 Dec 2023 13:58:46 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/20498 ) Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. Patch Set 14: (1 comment) http://gerrit.cloudera.org:8080/#/c/20498/14/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/20498/14/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@2003 PS14, Line 2003: String.format(" scan-range-selectivity=%.3f", scanRangeSelectivity_) Spelling this out to number of scan range might be better. That way, user can directly compare this against 'ScanRangesComplete' counter in profile. -- To view, visit http://gerrit.cloudera.org:8080/20498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 Gerrit-Change-Number: 20498 Gerrit-PatchSet: 14 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Mon, 04 Dec 2023 18:32:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20498 ) Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. Patch Set 14: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/14523/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 Gerrit-Change-Number: 20498 Gerrit-PatchSet: 14 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Mon, 27 Nov 2023 23:32:07 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/20498 ) Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. Patch Set 14: (1 comment) http://gerrit.cloudera.org:8080/#/c/20498/12/be/src/service/query-options-test.cc File be/src/service/query-options-test.cc: http://gerrit.cloudera.org:8080/#/c/20498/12/be/src/service/query-options-test.cc@315 PS12, Line 315: TQueryOptions options; > Done The MAKE_OPTIONDEF macro error when I move options later. ps14 move options back to beginning of test. -- To view, visit http://gerrit.cloudera.org:8080/20498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 Gerrit-Change-Number: 20498 Gerrit-PatchSet: 14 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Mon, 27 Nov 2023 23:07:11 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Hello Aman Sinha, Daniel Becker, Abhishek Rawat, David Rorke, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20498 to look at the new patch set (#14). Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. IMPALA-12018: Consider runtime filter for cardinality reduction Currently Impala creates a plan first and looks for runtime filters based on the complete plan. This means cardinality estimate in the query plan does not incorporate runtime filter selectivity. Actual scan cardinality from runtime execution is often much lower that the cardinality estimate due to existence of runtime filter. This patch applies runtime filter selectivity to lower cardinality estimates of scan nodes and certain join nodes above them after runtime filter generation and before resource requirement computation. The algorithm selects a contiguous probe pipeline consisting of a scan node, exchanges, and reducing join nodes. Depending on whether the join node produces a runtime filter and the type of that runtime filter, it then applies the runtime filter selectivity to the scan node to reduce its cardinality and input cardinality estimate. The runtime filter selectivity is calculated with the simplest join cardinality formula (JoinNode.computeGenericJoinCardinality()). While this cardinality reduction is present in all execution modes (MT_DOP=0, MT_DOP>0, and COMPUTE_PROCESSING_COST=1), cost-based planning mode will be the primary beneficiary of this patch. It can lead towards ProcessingCost reduction, lower scan fragment parallelism, and increase chance of query assignment to the smaller executor group set. Other execution modes will see no change in their execution parallelism, but might see lower resource estimate. This patch also adds development query option named RUNTIME_FILTER_CARDINALITY_REDUCTION_SCALE, a range of [0.0..1.0] that controls the cardinality reduction scale from runtime filter analysis to help with benchmarking and disabling cardinality reduction if needed (by setting to 0.0). Default to 1.0. Testing: - Add fe test testRuntimeFilterCardinalityReduction and testRuntimeFilterCardinalityReductionOnKudu - Pass test_executor_groups.py. - Pass PlannerTest#testProcessingCost. - Add be test QueryOptions.SetFractionalOptions - Pass core tests. Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 --- M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/JoinNode.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test M testdata/workloads/functional-planner/queries/PlannerTest/convert-to-cnf.test M testdata/workloads/functional-planner/queries/PlannerTest/explain-verbose-mt_dop.test M testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test M testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test M testdata/workloads/functional-planner/queries/PlannerTest/outer-to-inner-joins.test M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test A testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction-on-kudu.test A testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test M testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-dist-method.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q01.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q03.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test M
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20498 ) Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. Patch Set 13: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/14522/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/20498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 Gerrit-Change-Number: 20498 Gerrit-PatchSet: 13 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Mon, 27 Nov 2023 23:05:03 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/20498 ) Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. Patch Set 13: (21 comments) http://gerrit.cloudera.org:8080/#/c/20498/12//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20498/12//COMMIT_MSG@31 PS12, Line 31: mode > Nit: modes. Done http://gerrit.cloudera.org:8080/#/c/20498/12//COMMIT_MSG@35 PS12, Line 35: RUNTIME_FILTER_CARDINALITY_REDUCTION_SCALE > Could you elaborate on how it can be used (what values are valid, what they Done http://gerrit.cloudera.org:8080/#/c/20498/12/be/src/service/query-options-test.cc File be/src/service/query-options-test.cc: http://gerrit.cloudera.org:8080/#/c/20498/12/be/src/service/query-options-test.cc@315 PS12, Line 315: // List of pairs of Key and boolean flag on whether the option is inclusive of 0 and 1. > I'd put it right before the loop. Done http://gerrit.cloudera.org:8080/#/c/20498/12/be/src/service/query-options-test.cc@316 PS12, Line 316: pair, bool> case_set[]{ > This is not used. Done http://gerrit.cloudera.org:8080/#/c/20498/12/be/src/service/query-options.cc File be/src/service/query-options.cc: http://gerrit.cloudera.org:8080/#/c/20498/12/be/src/service/query-options.cc@1170 PS12, Line 1170: ; > No need for the 'float' literal, especially as it is a double. Done http://gerrit.cloudera.org:8080/#/c/20498/12/common/thrift/ImpalaService.thrift File common/thrift/ImpalaService.thrift: http://gerrit.cloudera.org:8080/#/c/20498/12/common/thrift/ImpalaService.thrift@837 PS12, Line 837: exceed > Nit: exceeds. Done http://gerrit.cloudera.org:8080/#/c/20498/12/common/thrift/ImpalaService.thrift@849 PS12, Line 849: Default > Nit: Defaults. Done http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java File fe/src/main/java/org/apache/impala/planner/ScanNode.java: http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@474 PS12, Line 474: only applies at HDFS columnar file > Now 'isAllColumnarScanner' is replaced with 'evalAtRowLevel', Kudu is also Yes. Mentioned Kudu in comment. http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@485 PS12, Line 485: reduc > Maybe 'reduced' would be better, "lower o. c. estimate" may suggest a "lowe Done http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@486 PS12, Line 486: at > Nit: is? Done http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@488 PS12, Line 488: below > It can be a bit confusing that this join node is at the bottom of the stack Added "in node tree". http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@514 PS12, Line 514: this, reducedCardinality, partitionSelectivities); > I think that it would be more readable if the the body of the loop would be Added RuntimeFilter.reducedCardinalityForScanNode(). http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@531 PS12, Line 531: dinality, scaledPartSel); : } > I am not sure if this is correct - all max selectivities per column will be Done http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@561 PS12, Line 561: > Can it be multiple join nodes or just one? There can be multiple join nodes in nodeStack and they connect with each other at probe pipeline. http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@565 PS12, Line 565: > Could you also mention 'reductionScale' here? Done http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@578 PS12, Line 578: dinality_ * scanRangeSelectivity_ > Could extract this as 'currentCardinality'. Done http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@591 PS12, Line 591: // with the least > Shouldn't it be 'partitionSelectivity'? Or is it the same thing? It is the same thing. I name it scanRangeSelectivity_ because from ScanNode perspective, it will be applied to reduce the estimated number of scan range that survive file-level filter and actually being read. Scan range might be 1 range per file or multiple range per file, depending on target file system. http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@609 PS12, Line 609: > Why do we leave out the last one? Clarified in the new comment below this. http://gerrit.cloudera.org:8080/#/c/20498/12/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test File
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Hello Aman Sinha, Daniel Becker, Abhishek Rawat, David Rorke, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20498 to look at the new patch set (#13). Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. IMPALA-12018: Consider runtime filter for cardinality reduction Currently Impala creates a plan first and looks for runtime filters based on the complete plan. This means cardinality estimate in the query plan does not incorporate runtime filter selectivity. Actual scan cardinality from runtime execution is often much lower that the cardinality estimate due to existence of runtime filter. This patch applies runtime filter selectivity to lower cardinality estimates of scan nodes and certain join nodes above them after runtime filter generation and before resource requirement computation. The algorithm selects a contiguous probe pipeline consisting of a scan node, exchanges, and reducing join nodes. Depending on whether the join node produces a runtime filter and the type of that runtime filter, it then applies the runtime filter selectivity to the scan node to reduce its cardinality and input cardinality estimate. The runtime filter selectivity is calculated with the simplest join cardinality formula (JoinNode.computeGenericJoinCardinality()). While this cardinality reduction is present in all execution modes (MT_DOP=0, MT_DOP>0, and COMPUTE_PROCESSING_COST=1), cost-based planning mode will be the primary beneficiary of this patch. It can lead towards ProcessingCost reduction, lower scan fragment parallelism, and increase chance of query assignment to the smaller executor group set. Other execution modes will see no change in their execution parallelism, but might see lower resource estimate. This patch also adds development query option named RUNTIME_FILTER_CARDINALITY_REDUCTION_SCALE, a range of [0.0..1.0] that controls the cardinality reduction scale from runtime filter analysis to help with benchmarking and disabling cardinality reduction if needed (by setting to 0.0). Default to 1.0. Testing: - Add fe test testRuntimeFilterCardinalityReduction and testRuntimeFilterCardinalityReductionOnKudu - Pass test_executor_groups.py. - Pass PlannerTest#testProcessingCost. - Add be test QueryOptions.SetFractionalOptions - Pass core tests. Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 --- M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/JoinNode.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test M testdata/workloads/functional-planner/queries/PlannerTest/convert-to-cnf.test M testdata/workloads/functional-planner/queries/PlannerTest/explain-verbose-mt_dop.test M testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test M testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test M testdata/workloads/functional-planner/queries/PlannerTest/outer-to-inner-joins.test M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test A testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction-on-kudu.test A testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test M testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-dist-method.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q01.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q03.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test M
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/20498 ) Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. Patch Set 12: (16 comments) http://gerrit.cloudera.org:8080/#/c/20498/12//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20498/12//COMMIT_MSG@31 PS12, Line 31: mode Nit: modes. http://gerrit.cloudera.org:8080/#/c/20498/12//COMMIT_MSG@35 PS12, Line 35: RUNTIME_FILTER_CARDINALITY_REDUCTION_SCALE Could you elaborate on how it can be used (what values are valid, what they mean) and what the default value is? http://gerrit.cloudera.org:8080/#/c/20498/12/be/src/service/query-options-test.cc File be/src/service/query-options-test.cc: http://gerrit.cloudera.org:8080/#/c/20498/12/be/src/service/query-options-test.cc@315 PS12, Line 315: TQueryOptions options; I'd put it right before the loop. http://gerrit.cloudera.org:8080/#/c/20498/12/be/src/service/query-options-test.cc@316 PS12, Line 316: QueryConstants qc; This is not used. http://gerrit.cloudera.org:8080/#/c/20498/12/be/src/service/query-options.cc File be/src/service/query-options.cc: http://gerrit.cloudera.org:8080/#/c/20498/12/be/src/service/query-options.cc@1170 PS12, Line 1170: f No need for the 'float' literal, especially as it is a double. http://gerrit.cloudera.org:8080/#/c/20498/12/common/thrift/ImpalaService.thrift File common/thrift/ImpalaService.thrift: http://gerrit.cloudera.org:8080/#/c/20498/12/common/thrift/ImpalaService.thrift@837 PS12, Line 837: exceed Nit: exceeds. http://gerrit.cloudera.org:8080/#/c/20498/12/common/thrift/ImpalaService.thrift@849 PS12, Line 849: Default Nit: Defaults. http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java File fe/src/main/java/org/apache/impala/planner/ScanNode.java: http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@474 PS12, Line 474: only applies at HDFS columnar file Now 'isAllColumnarScanner' is replaced with 'evalAtRowLevel', Kudu is also included, isn't it? http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@485 PS12, Line 485: lower Maybe 'reduced' would be better, "lower o. c. estimate" may suggest a "lower estimate", i.e. a lower bound. http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@486 PS12, Line 486: in Nit: is? http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@488 PS12, Line 488: below It can be a bit confusing that this join node is at the bottom of the stack but there are nodes below it. I know other nodes are below it in the node tree, not in the stack, but this could be made explicit. http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@561 PS12, Line 561: join nodes Can it be multiple join nodes or just one? http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@565 PS12, Line 565:* the simplest join cardinality formula from JoinNode.computeGenericJoinCardinality(). Could you also mention 'reductionScale' here? http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@578 PS12, Line 578: nodeStack.get(i).getCardinality() Could extract this as 'currentCardinality'. http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@591 PS12, Line 591: scanRangeSelectivity_ Shouldn't it be 'partitionSelectivity'? Or is it the same thing? http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@609 PS12, Line 609: > 1 Why do we leave out the last one? -- To view, visit http://gerrit.cloudera.org:8080/20498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 Gerrit-Change-Number: 20498 Gerrit-PatchSet: 12 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Tue, 21 Nov 2023 10:18:40 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/20498 ) Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. Patch Set 12: (5 comments) http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java File fe/src/main/java/org/apache/impala/planner/ScanNode.java: http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@514 PS12, Line 514: long buildKeyNdv = filter.getNdvEstimate(); I think that it would be more readable if the the body of the loop would be a separate functions - if I understand correctly it has a pretty clear role of counting a cardinality per filter and updating the partitionSelectivities. It may also make sense to move the function to RuntimeFilter, e.g. CalculateSelectivityForScanNode(). http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@531 PS12, Line 531: If column name is unknown, compare the selectivity : // against filters from other unknown column (colName == ""). I am not sure if this is correct - all max selectivities per column will be multiplied later, and it is possible that the "unknown" selectivity is related to a column already in the list, just wrapped in some expression. It may be safer to ignore these partition filters for now. http://gerrit.cloudera.org:8080/#/c/20498/12/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test File testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test: http://gerrit.cloudera.org:8080/#/c/20498/12/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test@11 PS12, Line 11: ss_sold_date_sk = d_date_sk Can you also add a test when there is an expression in the key to test relevant path in the code? Also, a similar test could be added with a partition key. http://gerrit.cloudera.org:8080/#/c/20498/12/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test@91 PS12, Line 91: partitions=1824/182 Is it possible to see the effect of runtime filters on partition selectivity in some test? http://gerrit.cloudera.org:8080/#/c/20498/12/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test@299 PS12, Line 299: left Can you also add a test for semi left join? Also, it seem strange that left anti join is supported as it is not included in isSelectiveAndReducingJoin. -- To view, visit http://gerrit.cloudera.org:8080/20498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 Gerrit-Change-Number: 20498 Gerrit-PatchSet: 12 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Mon, 20 Nov 2023 18:13:16 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20498 ) Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. Patch Set 12: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/14474/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 Gerrit-Change-Number: 20498 Gerrit-PatchSet: 12 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 17 Nov 2023 20:29:39 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/20498 ) Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. Patch Set 12: (4 comments) patch set 12 change the workload of testRuntimeFilterCardinalityReduction to tpcds_parquet to include row-level runtime filter for cardinality reduction. http://gerrit.cloudera.org:8080/#/c/20498/11/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test File testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test: http://gerrit.cloudera.org:8080/#/c/20498/11/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test@26 PS11, Line 26: ss_sold_date_sk = d_date_sk > Can you add a test with more than 1 equi join predicates? Add test that add extra predicate (sr_returned_date_sk = d_date_sk predicate). http://gerrit.cloudera.org:8080/#/c/20498/11/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test@40 PS11, Line 40: |--01:SCAN HDFS [tpcds_parquet.store_returns] : | HDFS partitions=1/1 files=1 size=15.43MB : | row-size=16B cardinality=287.51K : | > Would the planner also reduce build side scan node cardinality if there was Add test that also join against time_dim for this. Build side should see no cardinality reduction if it is scan node only. http://gerrit.cloudera.org:8080/#/c/20498/11/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test@46 PS11, Line 46: RF001 -> ss_sold_date_sk > Can you add an example with more than 1 runtime filters consumed by the sca In the new test that also join against time_dim, cardinality estimate reduced further to 61. http://gerrit.cloudera.org:8080/#/c/20498/11/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test@48 PS11, Line 48: > Are the parallel/distributed plans useful in the tests? At the first glance Removed DISTRIBUTEDPLAN and PARALLELPLANS. -- To view, visit http://gerrit.cloudera.org:8080/20498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 Gerrit-Change-Number: 20498 Gerrit-PatchSet: 12 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 17 Nov 2023 20:07:51 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Hello Aman Sinha, Daniel Becker, Abhishek Rawat, David Rorke, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20498 to look at the new patch set (#12). Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. IMPALA-12018: Consider runtime filter for cardinality reduction Currently Impala creates a plan first and looks for runtime filters based on the complete plan. This means cardinality estimate in the query plan does not incorporate runtime filter selectivity. Actual scan cardinality from runtime execution is often much lower that the cardinality estimate due to existence of runtime filter. This patch applies runtime filter selectivity to lower cardinality estimates of scan nodes and certain join nodes above them after runtime filter generation and before resource requirement computation. The algorithm selects a contiguous probe pipeline consisting of a scan node, exchanges, and reducing join nodes. Depending on whether the join node produces a runtime filter and the type of that runtime filter, it then applies the runtime filter selectivity to the scan node to reduce its cardinality and input cardinality estimate. The runtime filter selectivity is calculated with the simplest join cardinality formula (JoinNode.computeGenericJoinCardinality()). While this cardinality reduction is present in all execution modes (MT_DOP=0, MT_DOP>0, and COMPUTE_PROCESSING_COST=1), cost-based planning mode will be the primary beneficiary of this patch. It can lead towards ProcessingCost reduction, lower scan fragment parallelism, and increase chance of query assignment to the smaller executor group set. Other execution mode will see no change in their execution parallelism, but might see lower resource estimate. This patch also adds development query option named RUNTIME_FILTER_CARDINALITY_REDUCTION_SCALE to help with benchmarking and disabling cardinality reduction if needed (by setting to 0.0). Testing: - Add fe test testRuntimeFilterCardinalityReduction and testRuntimeFilterCardinalityReductionOnKudu - Pass test_executor_groups.py. - Pass PlannerTest#testProcessingCost. - Add be test QueryOptions.SetFractionalOptions - Pass core tests. Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 --- M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/JoinNode.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test M testdata/workloads/functional-planner/queries/PlannerTest/convert-to-cnf.test M testdata/workloads/functional-planner/queries/PlannerTest/explain-verbose-mt_dop.test M testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test M testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test M testdata/workloads/functional-planner/queries/PlannerTest/outer-to-inner-joins.test M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test A testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction-on-kudu.test A testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test M testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-dist-method.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q01.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q03.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q07.test M
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/20498 ) Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. Patch Set 11: (6 comments) http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/JoinNode.java File fe/src/main/java/org/apache/impala/planner/JoinNode.java: http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/JoinNode.java@1025 PS7, Line 1025: isLeftOuterJoin > Left outer join may still eligible to be included in nodeStack if it is sel Another join type: shouldn't this be applicable to semi join? http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java File fe/src/main/java/org/apache/impala/planner/ScanNode.java: http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@488 PS7, Line 488: the least output cardinality > Added testRuntimeFilterCardinalityReduction. Thanks, it is much more understandable for me now! http://gerrit.cloudera.org:8080/#/c/20498/11/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test File testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test: http://gerrit.cloudera.org:8080/#/c/20498/11/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test@26 PS11, Line 26: ss_sold_date_sk = d_date_sk Can you add a test with more than 1 equi join predicates? http://gerrit.cloudera.org:8080/#/c/20498/11/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test@40 PS11, Line 40: |--01:SCAN HDFS [tpcds.store_returns] : | HDFS partitions=1/1 files=1 size=31.19MB : | row-size=16B cardinality=287.51K : | Would the planner also reduce build side scan node cardinality if there was a bloom filter consumed there? Can you add a test for this? http://gerrit.cloudera.org:8080/#/c/20498/11/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test@46 PS11, Line 46: RF000 -> ss_sold_date_sk Can you add an example with more than 1 runtime filters consumed by the scanner? http://gerrit.cloudera.org:8080/#/c/20498/11/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test@48 PS11, Line 48: DISTRIBUTEDPLAN Are the parallel/distributed plans useful in the tests? At the first glance they are just adding noise. -- To view, visit http://gerrit.cloudera.org:8080/20498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 Gerrit-Change-Number: 20498 Gerrit-PatchSet: 11 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 17 Nov 2023 16:21:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20498 ) Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. Patch Set 11: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/14467/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 Gerrit-Change-Number: 20498 Gerrit-PatchSet: 11 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 17 Nov 2023 00:40:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/20498 ) Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. Patch Set 11: Ran exhaustive tests overnight and fixed 2 more tests in patch set 11. -- To view, visit http://gerrit.cloudera.org:8080/20498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 Gerrit-Change-Number: 20498 Gerrit-PatchSet: 11 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 17 Nov 2023 00:18:34 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Hello Aman Sinha, Daniel Becker, Abhishek Rawat, David Rorke, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20498 to look at the new patch set (#11). Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. IMPALA-12018: Consider runtime filter for cardinality reduction Currently Impala creates a plan first and looks for runtime filters based on the complete plan. This means cardinality estimate in the query plan does not incorporate runtime filter selectivity. Actual scan cardinality from runtime execution is often much lower that the cardinality estimate due to existence of runtime filter. This patch applies runtime filter selectivity to lower cardinality estimates of scan nodes and certain join nodes above them after runtime filter generation and before resource requirement computation. The algorithm selects a contiguous probe pipeline consisting of a scan node, exchanges, and reducing join nodes. Depending on whether the join node produces a runtime filter and the type of that runtime filter, it then applies the runtime filter selectivity to the scan node to reduce its cardinality and input cardinality estimate. The runtime filter selectivity is calculated with the simplest join cardinality formula (JoinNode.computeGenericJoinCardinality()). While this cardinality reduction is present in all execution modes (MT_DOP=0, MT_DOP>0, and COMPUTE_PROCESSING_COST=1), cost-based planning mode will be the primary beneficiary of this patch. It can lead towards ProcessingCost reduction, lower scan fragment parallelism, and increase chance of query assignment to the smaller executor group set. Other execution mode will see no change in their execution parallelism. This patch also adds development query option named RUNTIME_FILTER_CARDINALITY_REDUCTION_SCALE to help with benchmarking and disabling cardinality reduction if needed (by setting to 0.0). Testing: - Add fe test testRuntimeFilterCardinalityReduction and testRuntimeFilterCardinalityReductionOnKudu - Pass test_executor_groups.py. - Pass PlannerTest#testProcessingCost. - Add be test QueryOptions.SetFractionalOptions - Pass core tests. Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 --- M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/JoinNode.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test M testdata/workloads/functional-planner/queries/PlannerTest/convert-to-cnf.test M testdata/workloads/functional-planner/queries/PlannerTest/explain-verbose-mt_dop.test M testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test M testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test M testdata/workloads/functional-planner/queries/PlannerTest/outer-to-inner-joins.test M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test A testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction-on-kudu.test A testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test M testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-dist-method.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q01.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q03.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q07.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q08.test M
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20498 ) Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. Patch Set 10: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/14458/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 Gerrit-Change-Number: 20498 Gerrit-PatchSet: 10 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 16 Nov 2023 01:20:46 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20498 ) Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. Patch Set 9: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/14456/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 Gerrit-Change-Number: 20498 Gerrit-PatchSet: 9 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 16 Nov 2023 01:04:20 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/20498 ) Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. Patch Set 10: Patch set 10 add exception about inputCardinalityEst for Kudu. -- To view, visit http://gerrit.cloudera.org:8080/20498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 Gerrit-Change-Number: 20498 Gerrit-PatchSet: 10 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 16 Nov 2023 00:54:07 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Hello Aman Sinha, Daniel Becker, Abhishek Rawat, David Rorke, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20498 to look at the new patch set (#10). Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. IMPALA-12018: Consider runtime filter for cardinality reduction Currently Impala creates a plan first and looks for runtime filters based on the complete plan. This means cardinality estimate in the query plan does not incorporate runtime filter selectivity. Actual scan cardinality from runtime execution is often much lower that the cardinality estimate due to existence of runtime filter. This patch applies runtime filter selectivity to lower cardinality estimates of scan nodes and certain join nodes above them after runtime filter generation and before resource requirement computation. The algorithm selects a contiguous probe pipeline consisting of a scan node, exchanges, and reducing join nodes. Depending on whether the join node produces a runtime filter and the type of that runtime filter, it then applies the runtime filter selectivity to the scan node to reduce its cardinality and input cardinality estimate. The runtime filter selectivity is calculated with the simplest join cardinality formula (JoinNode.computeGenericJoinCardinality()). While this cardinality reduction is present in all execution modes (MT_DOP=0, MT_DOP>0, and COMPUTE_PROCESSING_COST=1), cost-based planning mode will be the primary beneficiary of this patch. It can lead towards ProcessingCost reduction, lower scan fragment parallelism, and increase chance of query assignment to the smaller executor group set. Other execution mode will see no change in their execution parallelism. This patch also adds development query option named RUNTIME_FILTER_CARDINALITY_REDUCTION_SCALE to help with benchmarking and disabling cardinality reduction if needed (by setting to 0.0). Testing: - Add fe test testRuntimeFilterCardinalityReduction and testRuntimeFilterCardinalityReductionOnKudu - Pass test_executor_groups.py. - Pass PlannerTest#testProcessingCost. - Add be test QueryOptions.SetFractionalOptions - Pass core tests. Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 --- M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/JoinNode.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test M testdata/workloads/functional-planner/queries/PlannerTest/convert-to-cnf.test M testdata/workloads/functional-planner/queries/PlannerTest/explain-verbose-mt_dop.test M testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test M testdata/workloads/functional-planner/queries/PlannerTest/outer-to-inner-joins.test M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test A testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction-on-kudu.test A testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test M testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-dist-method.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q01.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q03.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q07.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q08.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q10a.test M
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/20498 ) Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. Patch Set 9: (2 comments) http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java File fe/src/main/java/org/apache/impala/planner/ScanNode.java: http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@461 PS7, Line 461: columnar sc > Changed this variable name to evalAtRowLevel to better represent the intent Patch set 9 include Kudu scanner as well. I test myself and see that most of the time, the reduced cardinality is still above actual rows returned by Kudu. http://gerrit.cloudera.org:8080/#/c/20498/8/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test File testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test: http://gerrit.cloudera.org:8080/#/c/20498/8/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test@429 PS8, Line 429: # Removing sr_ticket_number=ss_ticket_number predicate will turn 03:HASH JOIN into : # an expanding join and makes probe pipeline ineligible for cardinality reduction. > Still unsure about this testcase. It looks like reducing cardinality of 00: I stand on current algorithm that nodeStack must not be empty and scan node cardinality reduction should only involve runtime filter coming from join nodes in nodeStack. It might miss some reduction opportunity, but is safer to reason about the propagation. -- To view, visit http://gerrit.cloudera.org:8080/20498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 Gerrit-Change-Number: 20498 Gerrit-PatchSet: 9 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 16 Nov 2023 00:43:06 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Hello Aman Sinha, Daniel Becker, Abhishek Rawat, David Rorke, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20498 to look at the new patch set (#9). Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. IMPALA-12018: Consider runtime filter for cardinality reduction Currently Impala creates a plan first and looks for runtime filters based on the complete plan. This means cardinality estimate in the query plan does not incorporate runtime filter selectivity. Actual scan cardinality from runtime execution is often much lower that the cardinality estimate due to existence of runtime filter. This patch applies runtime filter selectivity to lower cardinality estimates of scan nodes and certain join nodes above them after runtime filter generation and before resource requirement computation. The algorithm selects a contiguous probe pipeline consisting of a scan node, exchanges, and reducing join nodes. Depending on whether the join node produces a runtime filter and the type of that runtime filter, it then applies the runtime filter selectivity to the scan node to reduce its cardinality and input cardinality estimate. The runtime filter selectivity is calculated with the simplest join cardinality formula (JoinNode.computeGenericJoinCardinality()). While this cardinality reduction is present in all execution modes (MT_DOP=0, MT_DOP>0, and COMPUTE_PROCESSING_COST=1), cost-based planning mode will be the primary beneficiary of this patch. It can lead towards ProcessingCost reduction, lower scan fragment parallelism, and increase chance of query assignment to the smaller executor group set. Other execution mode will see no change in their execution parallelism. This patch also adds development query option named RUNTIME_FILTER_CARDINALITY_REDUCTION_SCALE to help with benchmarking and disabling cardinality reduction if needed (by setting to 0.0). Testing: - Add fe test testRuntimeFilterCardinalityReduction and testRuntimeFilterCardinalityReductionOnKudu - Pass test_executor_groups.py. - Pass PlannerTest#testProcessingCost. - Add be test QueryOptions.SetFractionalOptions - Pass core tests. Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 --- M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/JoinNode.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test M testdata/workloads/functional-planner/queries/PlannerTest/convert-to-cnf.test M testdata/workloads/functional-planner/queries/PlannerTest/explain-verbose-mt_dop.test M testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test M testdata/workloads/functional-planner/queries/PlannerTest/outer-to-inner-joins.test M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test A testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction-on-kudu.test A testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test M testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-dist-method.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q01.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q03.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q07.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q08.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q10a.test M
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/20498 ) Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. Patch Set 8: (10 comments) http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/JoinNode.java File fe/src/main/java/org/apache/impala/planner/JoinNode.java: http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/JoinNode.java@1025 PS7, Line 1025: isLeftOuterJoin > Can you explain why is it supported for left outer join? We shouldn't have Left outer join may still eligible to be included in nodeStack if it is selective and reducing (output cardinality < input cardinality), even though it is not producing runtime filter for the leftmost scan. Note that this method is exist to clear noteStack state if traversal arrives at an expanding join. runtime-filter-cardinality-reduction.test shows an example. http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java File fe/src/main/java/org/apache/impala/planner/ScanNode.java: http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@461 PS7, Line 461: lter call in > I think that cardinality reduction should be also applied to Kudu scanners. Changed this variable name to evalAtRowLevel to better represent the intention. Kudu scanner does not eval filter on row level (no call to EvalRuntimeFilter function), but I do remember that Impala can push runtime filter down to Kudu, which I'm not sure what the behavior will be. Please let me know what you think. http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@468 PS7, Line 468: > Can you add comments on why does isAllColumnarScanner matter? What happens No row-group or row level filtering happen if scanning against text file. I have not confirm if non-partition filter is still generated for all-text scan, but I think it is correct to generate non-partition runtime filter for mixed file format scan, but we should not use that to reason about cardinality reduction since it does not applies for all files. http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@480 PS7, Line 480: : /** :* Given a contiguous probe pipeline 'nodeStac > Do we guarantee this somehow? If yes, then there could be a Precondition i This is guaranteed by JoinNode.isSelectiveAndReducingJoin(). nodeStack will be cleared if traversal arrived at expanding join. Added sanity check at ps 8. http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@488 PS7, Line 488: ed output cardinality and par > I am still trying to wrap my head around this algorithm. Added testRuntimeFilterCardinalityReduction. http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@513 PS7, Line 513: e(node instanceof Join > Does this case handle the case when the key is result of an expression and Replaced this with filter.getTargetExpr(id_).getNumDistinctValues(). For the key, it is handled already through filter.getNdvEstimate() and filter.getBuildKeyNumCardinality(). http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@514 PS7, Line 514: JoinNode join = (JoinNode) node; > Should we be at this point if probe side key ndv is unknown? It may be safe Done http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@522 PS7, Line 522: ts(); > typo Done http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@544 PS7, Line 544:colName = ta > The name is a bit misleading, as it should be actually the lowestJoinCard. Done http://gerrit.cloudera.org:8080/#/c/20498/8/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test File testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test: http://gerrit.cloudera.org:8080/#/c/20498/8/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test@429 PS8, Line 429: # Removing sr_ticket_number=ss_ticket_number predicate will turn 03:HASH JOIN into : # an expanding join and makes probe pipeline ineligible for cardinality reduction. Still unsure about this testcase. It looks like reducing cardinality of 00:SCAN by RF000 is still valid, even though 04:HASH JOIN is not part of stackNode because 03:HASH JOIN is expanding. -- To view, visit http://gerrit.cloudera.org:8080/20498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id:
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20498 ) Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. Patch Set 8: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/14455/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 Gerrit-Change-Number: 20498 Gerrit-PatchSet: 8 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 15 Nov 2023 20:59:46 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Hello Aman Sinha, Daniel Becker, Abhishek Rawat, David Rorke, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20498 to look at the new patch set (#8). Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. IMPALA-12018: Consider runtime filter for cardinality reduction Currently Impala creates a plan first and looks for runtime filters based on the complete plan. This means cardinality estimate in the query plan does not incorporate runtime filter selectivity. Actual scan cardinality from runtime execution is often much lower that the cardinality estimate due to existence of runtime filter. This patch applies runtime filter selectivity to lower cardinality estimates of scan nodes and certain join nodes above them after runtime filter generation and before resource requirement computation. The algorithm selects a contiguous probe pipeline consisting of a scan node, exchanges, and reducing join nodes. Depending on whether the join node produces a runtime filter and the type of that runtime filter, it then applies the runtime filter selectivity to the scan node to reduce its cardinality and input cardinality estimate. The runtime filter selectivity is calculated with the simplest join cardinality formula (JoinNode.computeGenericJoinCardinality()). While this cardinality reduction is present in all execution modes (MT_DOP=0, MT_DOP>0, and COMPUTE_PROCESSING_COST=1), cost-based planning mode will be the primary beneficiary of this patch. It can lead towards ProcessingCost reduction, lower scan fragment parallelism, and increase chance of query assignment to the smaller executor group set. Other execution mode will see no change in their execution parallelism. This patch also adds development query option named RUNTIME_FILTER_CARDINALITY_REDUCTION_SCALE to help with benchmarking and disabling cardinality reduction if needed (by setting to 0.0). Testing: - Add PlannerTest.testRuntimeFilterCardinalityReduction - Pass test_executor_groups.py. - Pass PlannerTest#testProcessingCost. - Add be test QueryOptions.SetFractionalOptions - Pass core tests. Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 --- M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/JoinNode.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test M testdata/workloads/functional-planner/queries/PlannerTest/convert-to-cnf.test M testdata/workloads/functional-planner/queries/PlannerTest/explain-verbose-mt_dop.test M testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test M testdata/workloads/functional-planner/queries/PlannerTest/outer-to-inner-joins.test M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test A testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test M testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-dist-method.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q01.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q03.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q07.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q08.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q10a.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q11.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q12.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q13.test M
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/20498 ) Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. Patch Set 7: (9 comments) http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/JoinNode.java File fe/src/main/java/org/apache/impala/planner/JoinNode.java: http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/JoinNode.java@1025 PS7, Line 1025: isLeftOuterJoin Can you explain why is it supported for left outer join? We shouldn't have any runtime filter pushed down to probe side in that case. http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java File fe/src/main/java/org/apache/impala/planner/ScanNode.java: http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@461 PS7, Line 461: HdfsScanNode I think that cardinality reduction should be also applied to Kudu scanners. http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@468 PS7, Line 468: isAllColumnarScanner Can you add comments on why does isAllColumnarScanner matter? What happens if it is a text file? Do we generate runtime filters at all in that case if it is not a partition filter? http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@480 PS7, Line 480: he probe pipeline :* 'nodeStack' must have original cardinality estimate that continues decreasing from :* scan node up towards the highest join node. Do we guarantee this somehow? If yes, then there could be a Precondition in the loop. Generally this seems true for inner joins unless there are duplicates on the build side. http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@488 PS7, Line 488: getReducedCardinalityByFilter I am still trying to wrap my head around this algorithm. I think that it would be very helpful to have a 1-2 tests where more information would be provided to be able track how cardinalities are reduced, e.g. a single join and a multi join query with the NDVs in comment. Adding a summary in comment could be also useful to show that the cardinality reductions are realistic compared to the cardinalites during query execution. http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@513 PS7, Line 513: targetSlot == null ? - Does this case handle the case when the key is result of an expression and is not simply a slotRef? All Expr has getNumDistinctValues(), which is filled based on children expressions, so we could use that as an NDV estimate. http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@514 PS7, Line 514: if (scanColumnNdv < 0) scanColumnNdv = cardinality_; // fallback Should we be at this point if probe side key ndv is unknown? It may be safer to not do any cardinality reduction if we don't know the ndv. This is a bit different than doing the estimation in join nodes, where we really need to "guess" something even if there are no stats. http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@522 PS7, Line 522: colum typo http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@544 PS7, Line 544: highestJoinCard The name is a bit misleading, as it should be actually the lowestJoinCard. -- To view, visit http://gerrit.cloudera.org:8080/20498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 Gerrit-Change-Number: 20498 Gerrit-PatchSet: 7 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 15 Nov 2023 16:06:50 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20498 ) Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. Patch Set 7: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/14443/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 Gerrit-Change-Number: 20498 Gerrit-PatchSet: 7 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Tue, 14 Nov 2023 22:05:09 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/20498 ) Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/20498/6/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/20498/6/fe/src/main/java/org/apache/impala/service/Frontend.java@1816 PS6, Line 1816: TQueryExecRequest result = new TQueryExecRequest(); : if (options.runtime_filter_cardinality_reduction_scale > 0) { : Planner.reduceCardinalityByRuntimeFilter( : planRoots, options.runtime_filter_cardinality_reduction_scale); > This cardinality reduction happen before cost and resource requirement calc Added RUNTIME_FILTER_CARDINALITY_REDUCTION_SCALE option. -- To view, visit http://gerrit.cloudera.org:8080/20498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 Gerrit-Change-Number: 20498 Gerrit-PatchSet: 7 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Tue, 14 Nov 2023 21:38:44 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Hello Aman Sinha, Daniel Becker, Abhishek Rawat, David Rorke, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20498 to look at the new patch set (#7). Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. IMPALA-12018: Consider runtime filter for cardinality reduction Currently Impala creates a plan first and looks for runtime filters based on the complete plan. This means cardinality estimate in the query plan does not incorporate runtime filter selectivity. Actual scan cardinality from runtime execution is often much lower that the cardinality estimate due to existence of runtime filter. This patch applies runtime filter selectivity to lower cardinality estimates of scan nodes and certain join nodes above them after runtime filter generation and before resource requirement computation. The algorithm selects a contiguous probe pipeline consisting of a scan node, exchanges, and reducing join nodes. Depending on whether the join node produces a runtime filter and the type of that runtime filter, it then applies the runtime filter selectivity to the scan node to reduce its cardinality and input cardinality estimate. The runtime filter selectivity is calculated with the simplest join cardinality formula (JoinNode.computeGenericJoinCardinality()). While this cardinality reduction is present in all execution modes (MT_DOP=0, MT_DOP>0, and COMPUTE_PROCESSING_COST=1), cost-based planning mode will be the primary beneficiary of this patch. It can lead towards ProcessingCost reduction, lower scan fragment parallelism, and increase chance of query assignment to the smaller executor group set. Other execution mode will see no change in their execution parallelism. This patch also adds development query option named RUNTIME_FILTER_CARDINALITY_REDUCTION_SCALE to help with benchmarking and disabling cardinality reduction if needed (by setting to 0.0). Testing: - Pass test_executor_groups.py. - Pass PlannerTest#testProcessingCost. - Add be test QueryOptions.SetFractionalOptions - Pass core tests. Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 --- M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/JoinNode.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/service/Frontend.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test M testdata/workloads/functional-planner/queries/PlannerTest/convert-to-cnf.test M testdata/workloads/functional-planner/queries/PlannerTest/explain-verbose-mt_dop.test M testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test M testdata/workloads/functional-planner/queries/PlannerTest/outer-to-inner-joins.test M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-dist-method.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q01.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q03.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q07.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q08.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q10a.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q11.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q12.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q13.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q14a.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q15.test M
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/20498 ) Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/20498/6/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/20498/6/fe/src/main/java/org/apache/impala/service/Frontend.java@1816 PS6, Line 1816: Planner.reduceCardinalityByRuntimeFilter(planRoots); : Planner.computeProcessingCost(planRoots, result, planner.getPlannerCtx()); : Planner.computeResourceReqs(planRoots, queryCtx, result, : planner.getPlannerCtx(), planner.getAnalysisResult().isQueryStmt()); This cardinality reduction happen before cost and resource requirement calculation. It is possible that lower cardinality result in lower resource estimation, albeit only limited to ScanNode, JoinNode, and ExchangeNode. Having query option to disable/scale the cardinality reduction might be desirable to revert back to old behavior. -- To view, visit http://gerrit.cloudera.org:8080/20498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 Gerrit-Change-Number: 20498 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Tue, 14 Nov 2023 16:01:44 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20498 ) Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/14399/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 Gerrit-Change-Number: 20498 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 09 Nov 2023 18:55:38 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/20498 ) Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. Patch Set 6: (18 comments) http://gerrit.cloudera.org:8080/#/c/20498/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20498/5//COMMIT_MSG@18 PS5, Line 18: contiguou > Nit: contiguous Done http://gerrit.cloudera.org:8080/#/c/20498/5//COMMIT_MSG@26 PS5, Line 26: exe > Nit: modes. Done http://gerrit.cloudera.org:8080/#/c/20498/5//COMMIT_MSG@26 PS5, Line 26: reduction is > "... reduction is present/available in all ..." Done http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/JoinNode.java File fe/src/main/java/org/apache/impala/planner/JoinNode.java: http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/JoinNode.java@454 PS5, Line 454: tility m > Now that it's public, is it still internal? Removed this. http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/JoinNode.java@475 PS5, Line 475: computeGenericJoinCardina > This method was renamed. Done http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/PlanNode.java File fe/src/main/java/org/apache/impala/planner/PlanNode.java: http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/PlanNode.java@1180 PS5, Line 1180: ScanNode, ExchangeNode > Aren't also ExchangeNodes included? Done http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java: http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@710 PS5, Line 710: || buildSlotRef.getDesc().getParent() == null > Is it possible that buildSlotRef.getDesc().getParent() is null? Thanks! I added that check. http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java File fe/src/main/java/org/apache/impala/planner/ScanNode.java: http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java@22 PS5, Line 22: import java.util.List; > Seems to be unused. Done http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java@25 PS5, Line 25: > Seems to be unused. Done http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java@102 PS5, Line 102: tivity_ = 1.0; > It is not set there, and I can't find any other place where it's set. Thanks for catching this! This should be assigned at ScanNode.reduceCardinalityByRuntimeFilter(). http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java@456 PS5, Line 456:* join node id to list of runtime filters from it. > Could you describe the algorithm also here? Added comments. http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java@470 PS5, Line 470: // Row-level runtime filtering, however, only applies at columnar file format. > Could extract this loop into a function. Move it to groupFiltersForCardinalityReduction(). http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java@474 PS5, Line 474: > Nit: applies Done http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java@475 PS5, Line 475: > Nit: applies (present tense) would be better. Done http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java@487 PS5, Line 487:*/ > Could extract this loop into a function. Move it to getReducedCardinalityByFilter(). http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java@510 PS5, Line 510: ter.get > Nit: apply Done http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java@517 PS5, Line 517: cardOnThisJoin = Math.min(cardOnThisJoin, estCardAfterFilter); > If 'colName' isn't updated here for more than one column, those columns wil Yes, that is the intention. More distinct column will lead to lower parititonSelectivity. If columns can not be distinguished, pick the least selectivity one among them. http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java@528 PS5, Line 528: colName = targetSlot.getDesc().getColumn().getName(); > I don't understand why we choose the max here, we've been reducing 'reduced The algorithm relies on the original cardinality estimate and assume scan will return at least the same number of rows as the highest join node. Updated the code and comment to highlight this. -- To view,
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Hello Aman Sinha, Daniel Becker, Abhishek Rawat, David Rorke, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20498 to look at the new patch set (#6). Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. IMPALA-12018: Consider runtime filter for cardinality reduction Currently Impala creates a plan first and looks for runtime filters based on the complete plan. This means cardinality estimate in the query plan does not incorporate runtime filter selectivity. Actual scan cardinality from runtime execution is often much lower that the cardinality estimate due to existence of runtime filter. This patch applies runtime filter selectivity to lower cardinality estimates of scan nodes and certain join nodes above them after runtime filter generation and before resource requirement computation. The algorithm selects a contiguous probe pipeline consisting of a scan node, exchanges, and reducing join nodes. Depending on whether the join node produces a runtime filter and the type of that runtime filter, it then applies the runtime filter selectivity to the scan node to reduce its cardinality and input cardinality estimate. The runtime filter selectivity is calculated with the simplest join cardinality formula (JoinNode.computeGenericJoinCardinality()). While this cardinality reduction is present in all execution modes (MT_DOP=0, MT_DOP>0, and COMPUTE_PROCESSING_COST=1), cost-based planning mode will be the primary beneficiary of this patch. It can lead towards ProcessingCost reduction, lower scan fragment parallelism, and increase chance of query assignment to the smaller executor group set. Other execution mode will see no change in their execution parallelism. Testing: - Pass test_executor_groups.py. - Pass PlannerTest#testProcessingCost. - Pass core tests. Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 --- M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/JoinNode.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/service/Frontend.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test M testdata/workloads/functional-planner/queries/PlannerTest/convert-to-cnf.test M testdata/workloads/functional-planner/queries/PlannerTest/explain-verbose-mt_dop.test M testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test M testdata/workloads/functional-planner/queries/PlannerTest/outer-to-inner-joins.test M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-dist-method.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q01.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q03.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q07.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q08.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q10a.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q11.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q12.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q13.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q14a.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q15.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q16.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q17.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q18.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q19.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q20.test M
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/20498 ) Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. Patch Set 5: (18 comments) Thanks Riza http://gerrit.cloudera.org:8080/#/c/20498/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20498/5//COMMIT_MSG@18 PS5, Line 18: contigous Nit: contiguous http://gerrit.cloudera.org:8080/#/c/20498/5//COMMIT_MSG@26 PS5, Line 26: mode Nit: modes. http://gerrit.cloudera.org:8080/#/c/20498/5//COMMIT_MSG@26 PS5, Line 26: reduction in "... reduction is present/available in all ..." http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/JoinNode.java File fe/src/main/java/org/apache/impala/planner/JoinNode.java: http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/JoinNode.java@454 PS5, Line 454: internal Now that it's public, is it still internal? http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/JoinNode.java@475 PS5, Line 475: getGenericJoinCardinality This method was renamed. http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/PlanNode.java File fe/src/main/java/org/apache/impala/planner/PlanNode.java: http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/PlanNode.java@1180 PS5, Line 1180: ScanNode and JoinNodes Aren't also ExchangeNodes included? http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java: http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@710 PS5, Line 710: || buildSlotRef.getDesc().getParent().getTable() == null Is it possible that buildSlotRef.getDesc().getParent() is null? http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java File fe/src/main/java/org/apache/impala/planner/ScanNode.java: http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java@22 PS5, Line 22: import java.util.HashSet; Seems to be unused. http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java@25 PS5, Line 25: import java.util.Set; Seems to be unused. http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java@102 PS5, Line 102: Set in Planner.applyRuntimeFilterSelectivity() It is not set there, and I can't find any other place where it's set. http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java@456 PS5, Line 456: protected void reduceCardinalityByRuntimeFilter(Stack nodeStack) { Could you describe the algorithm also here? http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java@470 PS5, Line 470: for (RuntimeFilterGenerator.RuntimeFilter filter : getRuntimeFilters()) { Could extract this loop into a function. http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java@474 PS5, Line 474: aplied Nit: applies http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java@475 PS5, Line 475: applied Nit: applies (present tense) would be better. http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java@487 PS5, Line 487: for (int i = nodeStack.size() - 1; i >= 0; i--) { Could extract this loop into a function. http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java@510 PS5, Line 510: applies Nit: apply http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java@517 PS5, Line 517: colName = targetSlot.getDesc().getColumn().getName(); If 'colName' isn't updated here for more than one column, those columns will be handled if they were the same. http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java@528 PS5, Line 528: reducedCardinality = Math.max(reducedCardinality, highestJoinCard); I don't understand why we choose the max here, we've been reducing 'reducedCardinality' so far and 'nodeStack.get(0).getCardinality()' was not yet updated. Could you explain? Thanks. -- To view, visit http://gerrit.cloudera.org:8080/20498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 Gerrit-Change-Number: 20498 Gerrit-PatchSet: 5 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/20498 ) Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. Patch Set 5: patch set 5 is a rebase to catch up with latest toolchain. -- To view, visit http://gerrit.cloudera.org:8080/20498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 Gerrit-Change-Number: 20498 Gerrit-PatchSet: 5 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Tue, 07 Nov 2023 19:41:34 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20498 ) Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/14288/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 Gerrit-Change-Number: 20498 Gerrit-PatchSet: 4 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Tue, 31 Oct 2023 16:56:06 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Hello Daniel Becker, Abhishek Rawat, David Rorke, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20498 to look at the new patch set (#4). Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. IMPALA-12018: Consider runtime filter for cardinality reduction Currently Impala creates a plan first and looks for runtime filters based on the complete plan. This means cardinality estimate in the query plan does not incorporate runtime filter selectivity. Actual scan cardinality from runtime execution is often much lower that the cardinality estimate due to existence of runtime filter. This patch applies runtime filter selectivity to lower cardinality estimates of scan nodes and certain join nodes above them after runtime filter generation and before resource requirement computation. The algorithm selects a contigous probe pipeline consisting of a scan node, exchanges, and reducing join nodes. Depending on whether the join node produces a runtime filter and the type of that runtime filter, it then applies the runtime filter selectivity to the scan node to reduce its cardinality and input cardinality estimate. The runtime filter selectivity is calculated with the simplest join cardinality formula (JoinNode.computeGenericJoinCardinality()). While this cardinality reduction in all execution mode (MT_DOP=0, MT_DOP>0, and COMPUTE_PROCESSING_COST=1), cost-based planning mode will be the primary beneficiary of this patch. It can lead towards ProcessingCost reduction, lower scan fragment parallelism, and increase chance of query assignment to the smaller executor group set. Other execution mode will see no change in their execution parallelism. Testing: - Pass test_executor_groups.py. - Pass PlannerTest#testProcessingCost. - Pass core tests. Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 --- M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/JoinNode.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/service/Frontend.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test M testdata/workloads/functional-planner/queries/PlannerTest/convert-to-cnf.test M testdata/workloads/functional-planner/queries/PlannerTest/explain-verbose-mt_dop.test M testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test M testdata/workloads/functional-planner/queries/PlannerTest/outer-to-inner-joins.test M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-dist-method.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q01.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q03.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q07.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q08.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q10a.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q11.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q12.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q13.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q14a.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q15.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q16.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q17.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q18.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q19.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q20.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q21.test M
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/20498 ) Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. Patch Set 3: (3 comments) http://gerrit.cloudera.org:8080/#/c/20498/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20498/3//COMMIT_MSG@15 PS3, Line 15: This patch applies runtime filter selectivity to lower cardinality : estimates of scan nodes and certain join nodes above them after runtime : filter generation and before resource requirement computation. > I think that ideally we shouldn't use join node selectivity, but the NDV of I think understand your concern. As conclusion, is it correct that I should make similar implementation as JoinNode.getGenericJoinCardinality(), but against incoming runtime filters instead of join conjuncts? https://github.com/apache/impala/blob/c244aadcf367360e52807a84e7fba8b6237651fd/fe/src/main/java/org/apache/impala/planner/JoinNode.java#L404-L411 http://gerrit.cloudera.org:8080/#/c/20498/3//COMMIT_MSG@34 PS3, Line 34: Testing: > It would be nice to have some targeted tests for edge cases, e.g. missing s Maybe I should consider enabling this for all situation rather than exclusive on COMPUTE_PROCESSING_COST=1 to ease testing. I will look around. http://gerrit.cloudera.org:8080/#/c/20498/3/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: http://gerrit.cloudera.org:8080/#/c/20498/3/fe/src/main/java/org/apache/impala/planner/Planner.java@538 PS3, Line 538: CardinalityRefinerVisitor > I don't have a clear plan, but couldn't this be turned into a recursive fun Last time I tried to implement recursive calls over PlanNodes tree, I was hit by StackOverflow error. It could be a flaw in my implementation though. I'll try again making it recursive. -- To view, visit http://gerrit.cloudera.org:8080/20498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 Gerrit-Change-Number: 20498 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Tue, 24 Oct 2023 21:03:40 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/20498 ) Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. Patch Set 3: (3 comments) The implementation looks good to me, but i have some high level questions. http://gerrit.cloudera.org:8080/#/c/20498/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20498/3//COMMIT_MSG@15 PS3, Line 15: This patch applies runtime filter selectivity to lower cardinality : estimates of scan nodes and certain join nodes above them after runtime : filter generation and before resource requirement computation. I think that ideally we shouldn't use join node selectivity, but the NDV of the key on the build side. Generally we assume that selectivity on the join node will also reduce the NDV of the key, so the two becomes interchangeable, but I would prefer if this logic would stay within join node/runtime filter code, and not spread to other nodes. This is how I see the steps that lead to cardinality reduction in the scan node: 1. ndv(key) is calculated in the join builder based on build side columns stats and estimated selectivity 2. a runtime filter is created for key, which also has an fpp based on ndv(key) and bloom filter size 3. the scan node can calculate bloom filter selectivity based on its own ndv (after applying other predicates), the ndv from the bloom filter and fpp of the bloom filter This probably gives the same results as the current solution, but makes it easier to think about the effect of other predicates / runtime filters / duplicate keys on build side. A good example where this could help is a join with huge number of duplicates + a very selective filter on the build side. We could assume that the build side ndv(key) is reduced due to the selective predicate, but getJoinNodeSelectivity() would be still > 0 due to the duplicate matches. So we wouldn't reduce scanner cardinality while we could assume that many rows are dropped based on the bloom filter. http://gerrit.cloudera.org:8080/#/c/20498/3//COMMIT_MSG@34 PS3, Line 34: Testing: It would be nice to have some targeted tests for edge cases, e.g. missing stats, duplicate matches in join, effect of other predicates. http://gerrit.cloudera.org:8080/#/c/20498/3/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: http://gerrit.cloudera.org:8080/#/c/20498/3/fe/src/main/java/org/apache/impala/planner/Planner.java@538 PS3, Line 538: CardinalityRefinerVisitor I don't have a clear plan, but couldn't this be turned into a recursive function in PlanNode? -- To view, visit http://gerrit.cloudera.org:8080/20498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 Gerrit-Change-Number: 20498 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Tue, 24 Oct 2023 09:17:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/20498 ) Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/20498/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20498/2//COMMIT_MSG@31 PS2, Line 31: mode > Nit: modes Done -- To view, visit http://gerrit.cloudera.org:8080/20498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 Gerrit-Change-Number: 20498 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 19 Oct 2023 23:10:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Hello Daniel Becker, Abhishek Rawat, David Rorke, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20498 to look at the new patch set (#3). Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. IMPALA-12018: Consider runtime filter for cardinality reduction Currently Impala creates a plan first and looks for runtime filters based on the complete plan. This means cardinality estimate in the query plan does not incorporate runtime filter selectivity. Actual scan cardinality from runtime execution is often much lower that the cardinality estimate due to existence of runtime filter. This patch applies runtime filter selectivity to lower cardinality estimates of scan nodes and certain join nodes above them after runtime filter generation and before resource requirement computation. The algorithm selects a contigous probe pipeline consisting of a scan node, exchanges, and reducing join nodes. Depending on whether the join node produces a runtime filter and the type of that runtime filter, it then applies the runtime filter selectivity to the scan node to reduce its cardinality and input cardinality estimate. This cardinality reduction is currently only applied in cost-based planning mode (COMPUTE_PROCESSING_COST option is True) to avoid potential regression in regular planning mode. Cost-based planning mode can benefit the most from reduced scan cardinality. It can lead towards ProcessingCost reduction, lower scan fragment parallelism, and increase chance of query assignment to the smaller executor group set. We can consider enabling this cardinality reduction technique for all planning modes after more thorough performance evaluation (which require more planner test changes). Testing: - Pass test_executor_groups.py. - Pass PlannerTest#testProcessingCost. Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 --- M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test 5 files changed, 465 insertions(+), 253 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/20498/3 -- To view, visit http://gerrit.cloudera.org:8080/20498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 Gerrit-Change-Number: 20498 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/20498 ) Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. Patch Set 2: (4 comments) Thanks Riza. http://gerrit.cloudera.org:8080/#/c/20498/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20498/2//COMMIT_MSG@31 PS2, Line 31: mode Nit: modes http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/PlanNode.java File fe/src/main/java/org/apache/impala/planner/PlanNode.java: http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/PlanNode.java@132 PS1, Line 132: d la > I use Long here in case the original value cardinality_ itself is unknown ( Ok, I understand now. http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/PlanNode.java@386 PS1, Line 386: if (originalCardinality_ != null) { > Added "changed from". Please check the output in tpcds-processing-cost.test I think it's ok like this. http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java: http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@658 PS1, Line 658: > I chose to clarify with comment instead. Calling getEstFpp() on other kind It's ok if all other filters are deterministic, i.e. false positives are never returned. -- To view, visit http://gerrit.cloudera.org:8080/20498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 Gerrit-Change-Number: 20498 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 19 Oct 2023 15:00:57 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20498 ) Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/14195/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 Gerrit-Change-Number: 20498 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 18 Oct 2023 00:44:00 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/20498 ) Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. Patch Set 2: (16 comments) http://gerrit.cloudera.org:8080/#/c/20498/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20498/1//COMMIT_MSG@18 PS1, Line 18: select > Nit: selects Done http://gerrit.cloudera.org:8080/#/c/20498/1//COMMIT_MSG@20 PS1, Line 20: produce > Nit: produces. Done http://gerrit.cloudera.org:8080/#/c/20498/1//COMMIT_MSG@25 PS1, Line 25: option is True) > It's not clear to me how it explains that the cardinality reduction is only Clarified the intention in commit message. I plan to evaluate this in smaller scope first before enabling it in all planning mode. http://gerrit.cloudera.org:8080/#/c/20498/1//COMMIT_MSG@25 PS1, Line 25: mode (CO > Typo: PROCESSING Done http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/PlanNode.java File fe/src/main/java/org/apache/impala/planner/PlanNode.java: http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/PlanNode.java@128 PS1, Line 128: // invalid: -1 > Mention in the comment that it can also be replaced in replaceCardinality() Done http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/PlanNode.java@132 PS1, Line 132: d la > Does it need to be a Long? Can't we use -1 as the value indicating an inval I use Long here in case the original value cardinality_ itself is unknown (-1). Therefore, after replaceCardinality() called, (cardinality_=10, originalCardinality_=null) can be differentiated from (cardinality_=10, originalCardinality_=-1). http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/PlanNode.java@386 PS1, Line 386: if (originalCardinality_ != null) { > Shouldn't we add some explanation here about what the value in parentheses Added "changed from". Please check the output in tpcds-processing-cost.test and let me know if this is OK or too verbose. http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/Planner.java@534 PS1, Line 534: refine > Nit: refines. Done http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/Planner.java@547 PS1, Line 547: side > Nit: do we need 'hand' here? Removed. http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/Planner.java@585 PS1, Line 585: ace("refineCardin > Could add a precondition check that 'nodeStack_' is not empty. Done http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/Planner.java@592 PS1, Line 592: Filter filter : scan.ge > Could extract it into a variable. Done http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/Planner.java@592 PS1, Line 592: ator.Runtim > Could use Map::computeIfAbsent() here. Done http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/Planner.java@621 PS1, Line 621: consider > Nit: considers. Done http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/Planner.java@658 PS1, Line 658: require > Nit: requires. Done http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java: http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@640 PS1, Line 640: for (RuntimeFilterTarget target : targets_) { > I think it would be cleaner if we inverted the if and returned from there - Done http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@658 PS1, Line 658: > Can we check if it is a Bloom filter? We could add a Precondition check. I chose to clarify with comment instead. Calling getEstFpp() on other kind of filter is still legal, but will return 0. Please review if it is OK or not. -- To view, visit http://gerrit.cloudera.org:8080/20498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 Gerrit-Change-Number: 20498 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 18 Oct 2023 00:31:32 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Hello Daniel Becker, Abhishek Rawat, David Rorke, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20498 to look at the new patch set (#2). Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. IMPALA-12018: Consider runtime filter for cardinality reduction Currently Impala creates a plan first and looks for runtime filters based on the complete plan. This means cardinality estimate in the query plan does not incorporate runtime filter selectivity. Actual scan cardinality from runtime execution is often much lower that the cardinality estimate due to existence of runtime filter. This patch applies runtime filter selectivity to lower cardinality estimates of scan nodes and certain join nodes above them after runtime filter generation and before resource requirement computation. The algorithm selects a contigous probe pipeline consisting of a scan node, exchanges, and reducing join nodes. Depending on whether the join node produces a runtime filter and the type of that runtime filter, it then applies the runtime filter selectivity to the scan node to reduce its cardinality and input cardinality estimate. This cardinality reduction is currently only applied in cost-based planning mode (COMPUTE_PROCESSING_COST option is True) to avoid potential regression in regular planning mode. Cost-based planning mode can benefit the most from reduced scan cardinality. It can lead towards ProcessingCost reduction, lower scan fragment parallelism, and increase chance of query assignment to the smaller executor group set. We can consider enabling this cardinality reduction technique for all planning mode after more thorough performance evaluation (which require more planner test changes). Testing: - Pass test_executor_groups.py. - Pass PlannerTest#testProcessingCost. Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 --- M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test 5 files changed, 465 insertions(+), 253 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/20498/2 -- To view, visit http://gerrit.cloudera.org:8080/20498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 Gerrit-Change-Number: 20498 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/20498 ) Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. Patch Set 1: (16 comments) I can't say I understand the change, but here are some comments. http://gerrit.cloudera.org:8080/#/c/20498/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20498/1//COMMIT_MSG@18 PS1, Line 18: select Nit: selects http://gerrit.cloudera.org:8080/#/c/20498/1//COMMIT_MSG@20 PS1, Line 20: produce Nit: produces. http://gerrit.cloudera.org:8080/#/c/20498/1//COMMIT_MSG@25 PS1, Line 25: POCESSING Typo: PROCESSING http://gerrit.cloudera.org:8080/#/c/20498/1//COMMIT_MSG@25 PS1, Line 25: This is because It's not clear to me how it explains that the cardinality reduction is only applied if the query option is on (as opposed to always being applied). http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/PlanNode.java File fe/src/main/java/org/apache/impala/planner/PlanNode.java: http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/PlanNode.java@128 PS1, Line 128: protected long cardinality_; Mention in the comment that it can also be replaced in replaceCardinality(). http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/PlanNode.java@132 PS1, Line 132: Long Does it need to be a Long? Can't we use -1 as the value indicating an invalid/unset state, like in the case of 'cardinality_'? http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/PlanNode.java@386 PS1, Line 386: expBuilder.append("(") Shouldn't we add some explanation here about what the value in parentheses means? Or would it be too much here? http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/Planner.java@534 PS1, Line 534: refine Nit: refines. http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/Planner.java@547 PS1, Line 547: hand Nit: do we need 'hand' here? http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/Planner.java@585 PS1, Line 585: nodeStack_.get(0) Could add a precondition check that 'nodeStack_' is not empty. http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/Planner.java@592 PS1, Line 592: filter.getSrc().getId() Could extract it into a variable. http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/Planner.java@592 PS1, Line 592: containsKey Could use Map::computeIfAbsent() here. http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/Planner.java@621 PS1, Line 621: consider Nit: considers. http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/Planner.java@658 PS1, Line 658: require Nit: requires. http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java: http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@640 PS1, Line 640: for (RuntimeFilterTarget target : targets_) { I think it would be cleaner if we inverted the if and returned from there - 'continue' would not be needed. http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@658 PS1, Line 658: only valid for bloom filter. Can we check if it is a Bloom filter? We could add a Precondition check. -- To view, visit http://gerrit.cloudera.org:8080/20498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 Gerrit-Change-Number: 20498 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Tue, 17 Oct 2023 09:27:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20498 ) Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/14042/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 Gerrit-Change-Number: 20498 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 20 Sep 2023 23:21:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction
Riza Suminto has uploaded this change for review. ( http://gerrit.cloudera.org:8080/20498 Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction .. IMPALA-12018: Consider runtime filter for cardinality reduction Currently Impala creates a plan first and looks for runtime filters based on the complete plan. This means cardinality estimate in the query plan does not incorporate runtime filter selectivity. Actual scan cardinality from runtime execution is often much lower that the cardinality estimate due to existence of runtime filter. This patch applies runtime filter selectivity to lower cardinality estimates of scan nodes and certain join nodes above them after runtime filter generation and before resource requirement computation. The algorithm select a contigous probe pipeline consisting of a scan node, exchanges, and reducing join nodes. Depending on whether the join node produce a runtime filter and the type of that runtime filter, it then applies the runtime filter selectivity to the scan node to reduce its cardinality and input cardinality estimate. This cardinality reduction is currently only applied if COMPUTE_POCESSING_COST option is True. This is because multiple executor group set setup can benefit the most from reduced scan cardinality. It can lead towards ProcessingCost reduction, lower scan fragment parallelism, and increase chance of query assignment to the smaller executor group set. We can consider enabling this for all cases after more thorough performance evaluation (which require more planner test changes). Testing: - Pass test_executor_groups.py. - Pass PlannerTest#testProcessingCost. Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 --- M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test 5 files changed, 456 insertions(+), 252 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/20498/1 -- To view, visit http://gerrit.cloudera.org:8080/20498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 Gerrit-Change-Number: 20498 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto