Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/20498 )
Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction ...................................................................... IMPALA-12018: Consider runtime filter for cardinality reduction Currently, Impala creates a plan first and looks for runtime filters based on the complete plan. This means the cardinality estimate in the query plan does not incorporate runtime filter selectivity. Actual scan cardinality from runtime execution is often much lower than the cardinality estimate due to the existence of runtime filters. This patch applies runtime filter selectivity to lower cardinality estimates of scan nodes and certain join nodes above them after runtime filter generation and before resource requirement computation. The algorithm selects a contiguous probe pipeline consisting of a scan node, exchanges, and reducing join nodes. Depending on whether the join node produces a runtime filter and the type of that runtime filter, it then applies the runtime filter selectivity to the scan node to reduce its cardinality and input cardinality estimate. The runtime filter selectivity is calculated with the simplest join cardinality formula (JoinNode.computeGenericJoinCardinality()). The reduced cardinality is stored in new fields 'filteredCardinality_' and 'filteredInputCardinality_', separate from existing fields 'cardinality_' and 'inputCardinality_'. Future work should merge the new cardinality fields with the old cardinality fields after we can validate that the cardinality reduction does not regress memory estimation. While this cardinality reduction is present in all execution modes (MT_DOP=0, MT_DOP>0, and COMPUTE_PROCESSING_COST=1), cost-based planning mode will be the primary beneficiary of this patch. It can lead toward ProcessingCost reduction, lower scan fragment parallelism, lower CpuAsk, and increase the chance of query assignment to the smaller executor group set. Other execution modes will see no change in their execution parallelism or memory estimates. This patch also adds development query option named RUNTIME_FILTER_CARDINALITY_REDUCTION_SCALE, a range of [0.0..1.0] that controls the cardinality reduction scale from runtime filter analysis to help with benchmarking and disabling cardinality reduction if needed (by setting to 0.0). Default to 1.0. Testing: - Add fe test testRuntimeFilterCardinalityReduction and testRuntimeFilterCardinalityReductionOnKudu - Pass test_executor_groups.py. - Pass PlannerTest#testProcessingCost. - Add be test QueryOptions.SetFractionalOptions - Ran full TPC-DS 3TB benchmark and see no regression due to query plan change. - Pass core tests. Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 Reviewed-on: http://gerrit.cloudera.org:8080/20498 Reviewed-by: Impala Public Jenkins <[email protected]> Tested-by: Impala Public Jenkins <[email protected]> --- M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/DataStreamSink.java M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java M fe/src/main/java/org/apache/impala/planner/JoinNode.java M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test M testdata/workloads/functional-planner/queries/PlannerTest/convert-to-cnf.test M testdata/workloads/functional-planner/queries/PlannerTest/explain-verbose-mt_dop.test M testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test M testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test M testdata/workloads/functional-planner/queries/PlannerTest/outer-to-inner-joins.test M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test A testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction-on-kudu.test A testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test M testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-dist-method.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q01.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q03.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q07.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q08.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q10a.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q11.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q12.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q13.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q14a.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q15.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q16.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q17.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q18.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q19.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q20.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q21.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q23a.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q23b.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q24a.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q24b.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q25.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q26.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q27.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q29.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q30.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q31.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q32.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q33.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q34.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q35a.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q36.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q37.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q39a.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q39b.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q40.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q42.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q43.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q44.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q45.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q46.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q47.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q48.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q49.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q52.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q53.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q55.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q56.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q57.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q58.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q60.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q61.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q63.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q64.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q66.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q68.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q69.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q71.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q73.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q74.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q75.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q76.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q78.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q79.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q80.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q81.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q82.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q83.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q84.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q85.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q88.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q89.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q90.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q91.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q92.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q94.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q95.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q96.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q98.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test 117 files changed, 2,648 insertions(+), 1,294 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/20498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 Gerrit-Change-Number: 20498 Gerrit-PatchSet: 21 Gerrit-Owner: Riza Suminto <[email protected]> Gerrit-Reviewer: Abhishek Rawat <[email protected]> Gerrit-Reviewer: Aman Sinha <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: David Rorke <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]>
