Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20009 )

Change subject: IMPALA-12183: Fix cardinality clamping across aggregation phases
......................................................................


Patch Set 5:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/20009/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

http://gerrit.cloudera.org:8080/#/c/20009/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@98
PS4, Line 98: either the in
> Ack
Done


http://gerrit.cloudera.org:8080/#/c/20009/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@104
PS4, Line 104: contain
> Ack
Done


http://gerrit.cloudera.org:8080/#/c/20009/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@253
PS4, Line 253:         unknownEstimate = true;
> Yes, this is to initialize num groups for each class (aggClassNumGroups_).
Done


http://gerrit.cloudera.org:8080/#/c/20009/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@560
PS4, Line 560: gClassNumGroup = prevAgg.aggClassNumGroups_.get(aggIdx);
> This should check if prevAgg.aggClassNumGroups_.get(aggIdx) == -1, which in
Done


http://gerrit.cloudera.org:8080/#/c/20009/4/testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test:

http://gerrit.cloudera.org:8080/#/c/20009/4/testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test@322
PS4, Line 322: 3373800
> ProcessingCost of an agg node is the sum of ProcessingCost of each agg clas
I put some debugging logs and these are the num group estimates and the capping 
that happens against tpcds_parquet db:

 Agg 215:AGGREGATE compute stats. aggPhase_=FIRST_MERGE
 Agg 215:AGGREGATE aggInputCardinality=276963
 ndvBasedNumGroups=12052800
 Agg 215:AGGREGATE Class 0 numGroups=276963
 ndvBasedNumGroups=401760
 Agg 215:AGGREGATE Class 1 numGroups=276963
 ndvBasedNumGroups=8370
 Agg 215:AGGREGATE Class 2 numGroups=8370
 ndvBasedNumGroups=3
 Agg 215:AGGREGATE Class 3 numGroups=3
 ndvBasedNumGroups=1
 Agg 215:AGGREGATE Class 4 numGroups=1

Total numGroups across classes is 562300. Before this patch, processing cost 
assume 562300 cardinality for each class (562300 * 5 = 2811500). After this 
patch, they will use 276963, 276963, 8370, 3, and 1 accordingly (for total of 
562300).



--
To view, visit http://gerrit.cloudera.org:8080/20009
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1d414fe56b027f887c7f901d8a6799a388b16b95
Gerrit-Change-Number: 20009
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto <[email protected]>
Gerrit-Reviewer: Abhishek Rawat <[email protected]>
Gerrit-Reviewer: Aman Sinha <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Kurt Deschler <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Comment-Date: Wed, 07 Jun 2023 17:04:48 +0000
Gerrit-HasComments: Yes

Reply via email to