Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/20009 )
Change subject: IMPALA-12183: Fix cardinality clamping across aggregation phases ...................................................................... Patch Set 4: (5 comments) http://gerrit.cloudera.org:8080/#/c/20009/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java File fe/src/main/java/org/apache/impala/planner/AggregationNode.java: http://gerrit.cloudera.org:8080/#/c/20009/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@98 PS4, Line 98: the either of > nit: either the input .. Ack http://gerrit.cloudera.org:8080/#/c/20009/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@104 PS4, Line 104: content > nit: 'contain' Ack http://gerrit.cloudera.org:8080/#/c/20009/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@253 PS4, Line 253: unknownEstimate = true; > If we get an unknown estimate for any one aggregate class, even though the Yes, this is to initialize num groups for each class (aggClassNumGroups_). This is needed later to improve memory estimate of each class that is happen in computeNodeResourceProfile() & computeAggClassResourceProfile(), and processing cost in computeProcessingCost(). Having said that, I just realize one potential bug in line 560 that I'll comment separately. http://gerrit.cloudera.org:8080/#/c/20009/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@560 PS4, Line 560: Math.min(aggInputCardinality_, prevAgg.aggClassNumGroups_.get(aggIdx)); This should check if prevAgg.aggClassNumGroups_.get(aggIdx) == -1, which in that case it should return aggInputCardinality_. http://gerrit.cloudera.org:8080/#/c/20009/4/testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test File testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test: http://gerrit.cloudera.org:8080/#/c/20009/4/testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test@322 PS4, Line 322: 3373800 > Could you clarify how this cost changed to 1/5th of the original cost ? The ProcessingCost of an agg node is the sum of ProcessingCost of each agg class. One factor to ProcessingCost is number of rows to process (input cardinality). Before this patch, ProcessingCost for all class assume the same number of input cardinality, that is getChild(0).getCardinality(). The reduction in ProcessingCost here happens because we make input cardinality more granular per agg class. -- To view, visit http://gerrit.cloudera.org:8080/20009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1d414fe56b027f887c7f901d8a6799a388b16b95 Gerrit-Change-Number: 20009 Gerrit-PatchSet: 4 Gerrit-Owner: Riza Suminto <[email protected]> Gerrit-Reviewer: Abhishek Rawat <[email protected]> Gerrit-Reviewer: Aman Sinha <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Kurt Deschler <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Comment-Date: Wed, 07 Jun 2023 15:27:25 +0000 Gerrit-HasComments: Yes
