Hello Daniel Becker, Abhishek Rawat, David Rorke, Csaba Ringhofer, Impala Public Jenkins,
I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20498 to look at the new patch set (#2). Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction ...................................................................... IMPALA-12018: Consider runtime filter for cardinality reduction Currently Impala creates a plan first and looks for runtime filters based on the complete plan. This means cardinality estimate in the query plan does not incorporate runtime filter selectivity. Actual scan cardinality from runtime execution is often much lower that the cardinality estimate due to existence of runtime filter. This patch applies runtime filter selectivity to lower cardinality estimates of scan nodes and certain join nodes above them after runtime filter generation and before resource requirement computation. The algorithm selects a contigous probe pipeline consisting of a scan node, exchanges, and reducing join nodes. Depending on whether the join node produces a runtime filter and the type of that runtime filter, it then applies the runtime filter selectivity to the scan node to reduce its cardinality and input cardinality estimate. This cardinality reduction is currently only applied in cost-based planning mode (COMPUTE_PROCESSING_COST option is True) to avoid potential regression in regular planning mode. Cost-based planning mode can benefit the most from reduced scan cardinality. It can lead towards ProcessingCost reduction, lower scan fragment parallelism, and increase chance of query assignment to the smaller executor group set. We can consider enabling this cardinality reduction technique for all planning mode after more thorough performance evaluation (which require more planner test changes). Testing: - Pass test_executor_groups.py. - Pass PlannerTest#testProcessingCost. Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 --- M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test 5 files changed, 465 insertions(+), 253 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/20498/2 -- To view, visit http://gerrit.cloudera.org:8080/20498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 Gerrit-Change-Number: 20498 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto <riza.sumi...@cloudera.com> Gerrit-Reviewer: Abhishek Rawat <ara...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com> Gerrit-Reviewer: David Rorke <dro...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com>