Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20009 )

Change subject: IMPALA-12183: Fix cardinality clamping across aggregation phases
......................................................................


Patch Set 4:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/20009/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

http://gerrit.cloudera.org:8080/#/c/20009/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@98
PS4, Line 98: the either of
> nit: either the input ..
Ack


http://gerrit.cloudera.org:8080/#/c/20009/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@104
PS4, Line 104: content
> nit: 'contain'
Ack


http://gerrit.cloudera.org:8080/#/c/20009/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@253
PS4, Line 253:         unknownEstimate = true;
> If we get an unknown estimate for any one aggregate class, even though the
Yes, this is to initialize num groups for each class (aggClassNumGroups_). This 
is needed later to improve memory estimate of each class that is happen in 
computeNodeResourceProfile() & computeAggClassResourceProfile(), and processing 
cost in computeProcessingCost().

Having said that, I just realize one potential bug in line 560 that I'll 
comment separately.


http://gerrit.cloudera.org:8080/#/c/20009/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@560
PS4, Line 560: Math.min(aggInputCardinality_, 
prevAgg.aggClassNumGroups_.get(aggIdx));
This should check if prevAgg.aggClassNumGroups_.get(aggIdx) == -1, which in 
that case it should return  aggInputCardinality_.


http://gerrit.cloudera.org:8080/#/c/20009/4/testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test:

http://gerrit.cloudera.org:8080/#/c/20009/4/testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test@322
PS4, Line 322: 3373800
> Could you clarify how this cost changed to 1/5th of the original cost ? The
ProcessingCost of an agg node is the sum of ProcessingCost of each agg class. 
One factor to ProcessingCost is number of rows to process (input cardinality).

Before this patch, ProcessingCost for all class assume the same number of input 
cardinality, that is getChild(0).getCardinality(). The reduction in 
ProcessingCost here happens because we make input cardinality more granular per 
agg class.



--
To view, visit http://gerrit.cloudera.org:8080/20009
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1d414fe56b027f887c7f901d8a6799a388b16b95
Gerrit-Change-Number: 20009
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto <[email protected]>
Gerrit-Reviewer: Abhishek Rawat <[email protected]>
Gerrit-Reviewer: Aman Sinha <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Kurt Deschler <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Comment-Date: Wed, 07 Jun 2023 15:27:25 +0000
Gerrit-HasComments: Yes

Reply via email to