Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/22032 )
Change subject: IMPALA-13086: Lower AggregationNode estimate using stats predicate ...................................................................... Patch Set 2: (4 comments) http://gerrit.cloudera.org:8080/#/c/22032/2/fe/src/main/java/org/apache/impala/planner/AggregationNode.java File fe/src/main/java/org/apache/impala/planner/AggregationNode.java: http://gerrit.cloudera.org:8080/#/c/22032/2/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@417 PS2, Line 417: // This is done via memo lookup through analyzer.getProducingNode(). It's not clear to me how this reduces estimates in some cases. Is it because we can now look deeper in some instances that we could before? http://gerrit.cloudera.org:8080/#/c/22032/2/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q04.test File testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q04.test: http://gerrit.cloudera.org:8080/#/c/22032/2/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q04.test@138 PS2, Line 138: | runtime filters: RF000[bloom] <- customer_id, RF001[min_max] <- customer_id This has a significant effect on the Q4 plan. How does it affect execution performance? http://gerrit.cloudera.org:8080/#/c/22032/2/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q11.test File testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q11.test: http://gerrit.cloudera.org:8080/#/c/22032/2/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q11.test@108 PS2, Line 108: |--29:HASH JOIN [INNER JOIN] Joins also re-ordered here. http://gerrit.cloudera.org:8080/#/c/22032/2/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q31.test File testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q31.test: http://gerrit.cloudera.org:8080/#/c/22032/2/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q31.test@2048 PS2, Line 2048: | tuple-ids=3 row-size=50B cardinality=1.82K cost=1948896250 Have you done any sanity checks to see if these new estimates seem reasonable with the actual execution? -- To view, visit http://gerrit.cloudera.org:8080/22032 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia840d68f1c4f126d4e928461ec5c44545dbf25f8 Gerrit-Change-Number: 22032 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto <[email protected]> Gerrit-Reviewer: Aman Sinha <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Thu, 07 Nov 2024 22:21:30 +0000 Gerrit-HasComments: Yes
