Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/20379 )
Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits ...................................................................... Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java File fe/src/main/java/org/apache/impala/planner/AggregationNode.java: http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@786 PS4, Line 786: && isMultiPhase() > isMultiPhase encompasses all nodes that are part of a chain of aggregation I see. Can you perform a simple performance test to see if this would negatively affect queries that a very small subset of non-merge aggregate nodes can provide the answer? For example, let us partition table T on column a, b into 10 partitions and sorted on a, b. The query is select distinct a, b from T limit 2. Normally, such query can finish as soon as two smallest subsets of rows (on a, b) are read in. By reading the code here, my understand is that with the change we can not complete early until on all read nodes (from 10 partitions) are done the work and we can complete early only at the very top merge node is active. True? -- To view, visit http://gerrit.cloudera.org:8080/20379 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 Gerrit-Change-Number: 20379 Gerrit-PatchSet: 7 Gerrit-Owner: Michael Smith <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Qifan Chen <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Comment-Date: Thu, 24 Aug 2023 20:52:01 +0000 Gerrit-HasComments: Yes
