Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/20009 )
Change subject: IMPALA-12183: Fix cardinality clamping across aggregation phases ...................................................................... Patch Set 5: (5 comments) http://gerrit.cloudera.org:8080/#/c/20009/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java File fe/src/main/java/org/apache/impala/planner/AggregationNode.java: http://gerrit.cloudera.org:8080/#/c/20009/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@98 PS4, Line 98: either the in > Ack Done http://gerrit.cloudera.org:8080/#/c/20009/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@104 PS4, Line 104: contain > Ack Done http://gerrit.cloudera.org:8080/#/c/20009/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@253 PS4, Line 253: unknownEstimate = true; > Yes, this is to initialize num groups for each class (aggClassNumGroups_). Done http://gerrit.cloudera.org:8080/#/c/20009/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@560 PS4, Line 560: gClassNumGroup = prevAgg.aggClassNumGroups_.get(aggIdx); > This should check if prevAgg.aggClassNumGroups_.get(aggIdx) == -1, which in Done http://gerrit.cloudera.org:8080/#/c/20009/4/testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test File testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test: http://gerrit.cloudera.org:8080/#/c/20009/4/testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test@322 PS4, Line 322: 3373800 > ProcessingCost of an agg node is the sum of ProcessingCost of each agg clas I put some debugging logs and these are the num group estimates and the capping that happens against tpcds_parquet db: Agg 215:AGGREGATE compute stats. aggPhase_=FIRST_MERGE Agg 215:AGGREGATE aggInputCardinality=276963 ndvBasedNumGroups=12052800 Agg 215:AGGREGATE Class 0 numGroups=276963 ndvBasedNumGroups=401760 Agg 215:AGGREGATE Class 1 numGroups=276963 ndvBasedNumGroups=8370 Agg 215:AGGREGATE Class 2 numGroups=8370 ndvBasedNumGroups=3 Agg 215:AGGREGATE Class 3 numGroups=3 ndvBasedNumGroups=1 Agg 215:AGGREGATE Class 4 numGroups=1 Total numGroups across classes is 562300. Before this patch, processing cost assume 562300 cardinality for each class (562300 * 5 = 2811500). After this patch, they will use 276963, 276963, 8370, 3, and 1 accordingly (for total of 562300). -- To view, visit http://gerrit.cloudera.org:8080/20009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1d414fe56b027f887c7f901d8a6799a388b16b95 Gerrit-Change-Number: 20009 Gerrit-PatchSet: 5 Gerrit-Owner: Riza Suminto <[email protected]> Gerrit-Reviewer: Abhishek Rawat <[email protected]> Gerrit-Reviewer: Aman Sinha <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Kurt Deschler <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Comment-Date: Wed, 07 Jun 2023 17:04:48 +0000 Gerrit-HasComments: Yes
