Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20379 )

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
......................................................................


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@786
PS4, Line 786: && isMultiPhase()
> isMultiPhase encompasses all nodes that are part of a chain of aggregation
I see.

Can you perform a simple performance test to see if this would negatively 
affect queries that a very small subset of non-merge aggregate nodes can 
provide the answer?

For example, let us partition table T on column a, b into 10 partitions and 
sorted on a, b. The query is
select distinct a, b from T limit 2.

Normally, such query can finish as soon as two smallest subsets of rows (on a, 
b) are read in.

By reading the code here, my understand is that with the change we can not 
complete early until on all read nodes (from 10 partitions) are done the work 
and we can complete early only at the very top merge node is active. True?



--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 7
Gerrit-Owner: Michael Smith <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Comment-Date: Thu, 24 Aug 2023 20:52:01 +0000
Gerrit-HasComments: Yes

Reply via email to