[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-09-22 Thread Qifan Chen (Code Review)
Qifan Chen has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/17821 )

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..

IMPALA-2581: LIMIT can be propagated down into some aggregations

This patch contains 2 parts:
1. When both conditions below are true, push down limit to
pre-aggregation
 a) aggregation node has no aggregate function
 b) aggregation node has no predicate
2. finish aggregation when number of unique keys of hash table has
exceeded the limit.

Sample queries:
SELECT DISTINCT f FROM t LIMIT n
Can pass the LIMIT all the way down to the pre-aggregation, which
leads to a nearly unbounded speedup on these queries in large tables
when n is low.

Testing:
Add test targeted-perf/queries/aggregation.test
Pass core test

Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Reviewed-on: http://gerrit.cloudera.org:8080/17821
Reviewed-by: Qifan Chen 
Tested-by: Impala Public Jenkins 
---
M be/src/exec/aggregation-node-base.cc
M be/src/exec/aggregation-node-base.h
M be/src/exec/aggregation-node.cc
M be/src/exec/aggregator.h
M be/src/exec/grouping-aggregator.cc
M be/src/exec/grouping-aggregator.h
M be/src/exec/non-grouping-aggregator.h
M be/src/exec/streaming-aggregation-node.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test
M testdata/workloads/functional-query/queries/QueryTest/spilling.test
M testdata/workloads/targeted-perf/queries/aggregation.test
19 files changed, 147 insertions(+), 29 deletions(-)

Approvals:
  Qifan Chen: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 18
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: liuyao 


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-09-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17821 )

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..


Patch Set 17: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 17
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Wed, 22 Sep 2021 17:17:19 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-09-22 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17821 )

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..


Patch Set 17: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/functional-query/queries/QueryTest/spilling.test
File testdata/workloads/functional-query/queries/QueryTest/spilling.test:

http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/functional-query/queries/QueryTest/spilling.test@450
PS14, Line 450:
> In some cases, FastLimitCheckExceededRows is 0,
Yeah, the new code ignores the spilled partitions. Done.



--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 17
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Wed, 22 Sep 2021 13:26:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-09-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17821 )

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..


Patch Set 17:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7480/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 17
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Wed, 22 Sep 2021 11:01:32 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-09-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17821 )

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..


Patch Set 17: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7479/


--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 17
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Wed, 22 Sep 2021 10:21:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-09-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17821 )

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..


Patch Set 16:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/9482/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 16
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Wed, 22 Sep 2021 04:17:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-09-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17821 )

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..


Patch Set 17:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7479/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 17
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Wed, 22 Sep 2021 04:03:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-09-21 Thread liuyao (Code Review)
liuyao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17821 )

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..


Patch Set 17:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/functional-query/queries/QueryTest/spilling.test
File testdata/workloads/functional-query/queries/QueryTest/spilling.test:

http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/functional-query/queries/QueryTest/spilling.test@450
PS14, Line 450:
> See my other comment.
In some cases, FastLimitCheckExceededRows is 0,
e.g. SET debug_action=-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0;


http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/targeted-perf/queries/aggregation.test
File testdata/workloads/targeted-perf/queries/aggregation.test:

http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/targeted-perf/queries/aggregation.test@2729
PS14, Line 2729: 1
> I see. What I mean is to assure the value is greater than 0.
Done



--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 17
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Wed, 22 Sep 2021 04:02:56 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-09-21 Thread liuyao (Code Review)
Hello Qifan Chen, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17821

to look at the new patch set (#16).

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..

IMPALA-2581: LIMIT can be propagated down into some aggregations

This patch contains 2 parts:
1. When both conditions below are true, push down limit to
pre-aggregation
 a) aggregation node has no aggregate function
 b) aggregation node has no predicate
2. finish aggregation when number of unique keys of hash table has
exceeded the limit.

Sample queries:
SELECT DISTINCT f FROM t LIMIT n
Can pass the LIMIT all the way down to the pre-aggregation, which
leads to a nearly unbounded speedup on these queries in large tables
when n is low.

Testing:
Add test targeted-perf/queries/aggregation.test
Pass core test

Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
---
M be/src/exec/aggregation-node-base.cc
M be/src/exec/aggregation-node-base.h
M be/src/exec/aggregation-node.cc
M be/src/exec/aggregator.h
M be/src/exec/grouping-aggregator.cc
M be/src/exec/grouping-aggregator.h
M be/src/exec/non-grouping-aggregator.h
M be/src/exec/streaming-aggregation-node.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test
M testdata/workloads/functional-query/queries/QueryTest/spilling.test
M testdata/workloads/targeted-perf/queries/aggregation.test
19 files changed, 147 insertions(+), 29 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/16
--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 16
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: liuyao 


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-09-20 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17821 )

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..


Patch Set 15:

(2 comments)

Just like to double check on the expected test results.

http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/functional-query/queries/QueryTest/spilling.test
File testdata/workloads/functional-query/queries/QueryTest/spilling.test:

http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/functional-query/queries/QueryTest/spilling.test@450
PS14, Line 450:
> I modified the unit test because autotest failed.FastLimitCheckExceededRows
See my other comment.


http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/targeted-perf/queries/aggregation.test
File testdata/workloads/targeted-perf/queries/aggregation.test:

http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/targeted-perf/queries/aggregation.test@2729
PS14, Line 2729: 0
> Usually, rowbatch contains 1024 rows, stream agg processeed a rowbatch, the
I see. What I mean is to assure the value is greater than 0.

So maybe use [1-9][0-9]*?



--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 15
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Mon, 20 Sep 2021 16:13:27 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-09-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17821 )

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..


Patch Set 15: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7477/


--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 15
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Sat, 18 Sep 2021 12:39:59 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-09-18 Thread liuyao (Code Review)
liuyao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17821 )

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..


Patch Set 15:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/functional-query/queries/QueryTest/spilling.test
File testdata/workloads/functional-query/queries/QueryTest/spilling.test:

http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/functional-query/queries/QueryTest/spilling.test@450
PS14, Line 450:
> nit. 1?
I modified the unit test because autotest failed.FastLimitCheckExceededRows can 
be unidigit, double digits, three digits, so may contain 0


http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/targeted-perf/queries/aggregation.test
File testdata/workloads/targeted-perf/queries/aggregation.test:

http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/targeted-perf/queries/aggregation.test@2729
PS14, Line 2729: 0
> nit. Should it be 1-9 instead?
Usually, rowbatch contains 1024 rows, stream agg processeed a rowbatch, then 
check whether the number of rows returned more than limit, so 
FastLimitCheckExceededRows can be unidigit, double digits, three digits, so may 
contain 0



--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 15
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Sat, 18 Sep 2021 07:04:53 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-09-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17821 )

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..


Patch Set 15:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/9469/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 15
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Sat, 18 Sep 2021 06:33:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-09-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17821 )

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..


Patch Set 15:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7477/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 15
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Sat, 18 Sep 2021 06:18:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-09-18 Thread liuyao (Code Review)
Hello Qifan Chen, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17821

to look at the new patch set (#15).

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..

IMPALA-2581: LIMIT can be propagated down into some aggregations

This patch contains 2 parts:
1. When both conditions below are true, push down limit to
pre-aggregation
 a) aggregation node has no aggregate function
 b) aggregation node has no predicate
2. finish aggregation when number of unique keys of hash table has
exceeded the limit.

Sample queries:
SELECT DISTINCT f FROM t LIMIT n
Can pass the LIMIT all the way down to the pre-aggregation, which
leads to a nearly unbounded speedup on these queries in large tables
when n is low.

Testing:
Add test targeted-perf/queries/aggregation.test
Pass core test

Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
---
M be/src/exec/aggregation-node-base.cc
M be/src/exec/aggregation-node-base.h
M be/src/exec/aggregation-node.cc
M be/src/exec/aggregator.h
M be/src/exec/grouping-aggregator.cc
M be/src/exec/grouping-aggregator.h
M be/src/exec/non-grouping-aggregator.h
M be/src/exec/streaming-aggregation-node.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test
M testdata/workloads/functional-query/queries/QueryTest/spilling.test
M testdata/workloads/targeted-perf/queries/aggregation.test
19 files changed, 147 insertions(+), 29 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/15
--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 15
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: liuyao 


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-09-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17821 )

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..


Patch Set 14: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7470/


--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 14
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Tue, 14 Sep 2021 18:32:16 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-09-14 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17821 )

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..


Patch Set 14:

(2 comments)

Looks super!

Just a minor comment on the specs for skipped rows in two query tests.
Otherwise, good for +2.

http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/functional-query/queries/QueryTest/spilling.test
File testdata/workloads/functional-query/queries/QueryTest/spilling.test:

http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/functional-query/queries/QueryTest/spilling.test@450
PS14, Line 450: 0
nit. 1?


http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/targeted-perf/queries/aggregation.test
File testdata/workloads/targeted-perf/queries/aggregation.test:

http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/targeted-perf/queries/aggregation.test@2729
PS14, Line 2729: 0
nit. Should it be 1-9 instead?



--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 14
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Tue, 14 Sep 2021 13:27:29 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-09-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17821 )

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..


Patch Set 14:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/9456/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 14
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Tue, 14 Sep 2021 12:30:57 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-09-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17821 )

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..


Patch Set 14:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7470/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 14
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Tue, 14 Sep 2021 12:12:15 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-09-14 Thread liuyao (Code Review)
liuyao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17821 )

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..


Patch Set 14:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17821/13/be/src/exec/streaming-aggregation-node.cc
File be/src/exec/streaming-aggregation-node.cc:

http://gerrit.cloudera.org:8080/#/c/17821/13/be/src/exec/streaming-aggregation-node.cc@134
PS13, Line 134:  runtime_profile_->AddInfoString("FastLimitCheckExceededRows",
  :   SimpleItoa(aggs_[0]->GetNumKeys() - limit()));
  :   VLOG_QUERY << S
> If we can add the info to runtime_profile_, it will be more useful. For exa
Done


http://gerrit.cloudera.org:8080/#/c/17821/13/testdata/workloads/functional-query/queries/QueryTest/spilling.test
File testdata/workloads/functional-query/queries/QueryTest/spilling.test:

http://gerrit.cloudera.org:8080/#/c/17821/13/testdata/workloads/functional-query/queries/QueryTest/spilling.test@446
PS13, Line 446: GINT
> Can we also verify that some rows are indeed skipped in spill situation?
Done


http://gerrit.cloudera.org:8080/#/c/17821/13/testdata/workloads/targeted-perf/queries/aggregation.test
File testdata/workloads/targeted-perf/queries/aggregation.test:

http://gerrit.cloudera.org:8080/#/c/17821/13/testdata/workloads/targeted-perf/queries/aggregation.test@2726
PS13, Line 2726:  speed up aggregations
> Can we verify that most of the rows are indeed skipped fast?
Done



--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 14
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Tue, 14 Sep 2021 12:08:39 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-09-14 Thread liuyao (Code Review)
Hello Qifan Chen, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17821

to look at the new patch set (#14).

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..

IMPALA-2581: LIMIT can be propagated down into some aggregations

This patch contains 2 parts:
1. When both conditions below are true, push down limit to
pre-aggregation
 a) aggregation node has no aggregate function
 b) aggregation node has no predicate
2. finish aggregation when number of unique keys of hash table has
exceeded the limit.

Sample queries:
SELECT DISTINCT f FROM t LIMIT n
Can pass the LIMIT all the way down to the pre-aggregation, which
leads to a nearly unbounded speedup on these queries in large tables
when n is low.

Testing:
Add test targeted-perf/queries/aggregation.test
Pass core test

Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
---
M be/src/exec/aggregation-node-base.cc
M be/src/exec/aggregation-node-base.h
M be/src/exec/aggregation-node.cc
M be/src/exec/aggregator.h
M be/src/exec/grouping-aggregator.cc
M be/src/exec/grouping-aggregator.h
M be/src/exec/non-grouping-aggregator.h
M be/src/exec/streaming-aggregation-node.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test
M testdata/workloads/functional-query/queries/QueryTest/spilling.test
M testdata/workloads/targeted-perf/queries/aggregation.test
19 files changed, 149 insertions(+), 29 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/14
--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 14
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: liuyao 


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-09-13 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17821 )

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..


Patch Set 13:

(4 comments)

I think if we can improve the observability a little bit, it will be great.

http://gerrit.cloudera.org:8080/#/c/17821/13/be/src/exec/streaming-aggregation-node.cc
File be/src/exec/streaming-aggregation-node.cc:

http://gerrit.cloudera.org:8080/#/c/17821/13/be/src/exec/streaming-aggregation-node.cc@134
PS13, Line 134:  VLOG_QUERY << "the number of rows (" << aggs_[0]->GetNumKeys() 
<< ") returned"
  :   " from the streaming aggregation node has 
exceeded the limit of "
  :   << limit();
If we can add the info to runtime_profile_, it will be more useful. For 
example, to verify that the feature is able to kick in in query tests.

runtime_profile_->AddInfoString("Hdfs Read Thread Concurrency Bucket", 
ss.str());


http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test:

http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test@2934
PS11, Line 2934: limit: 2
> where id = subquery,If this subQuery returns 2 rows, we can sure that it is
Okay. Looks this is a badly written query when it returns more one row. My 
fault.

The following version runs fine on my box and I suppose your new feature should 
not kick in.

select * from functional.alltypes where id in   

  (select i from (select bigint_col as i from functional.alltypes 
  union   
  select tinyint_col as i from functional.alltypes) t
)   
   
;


http://gerrit.cloudera.org:8080/#/c/17821/13/testdata/workloads/functional-query/queries/QueryTest/spilling.test
File testdata/workloads/functional-query/queries/QueryTest/spilling.test:

http://gerrit.cloudera.org:8080/#/c/17821/13/testdata/workloads/functional-query/queries/QueryTest/spilling.test@446
PS13, Line 446: Verify
Can we also verify that some rows are indeed skipped in spill situation?


http://gerrit.cloudera.org:8080/#/c/17821/13/testdata/workloads/targeted-perf/queries/aggregation.test
File testdata/workloads/targeted-perf/queries/aggregation.test:

http://gerrit.cloudera.org:8080/#/c/17821/13/testdata/workloads/targeted-perf/queries/aggregation.test@2726
PS13, Line 2726:  speed up aggregations
Can we verify that most of the rows are indeed skipped fast?



--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 13
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Mon, 13 Sep 2021 15:15:19 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-09-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17821 )

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..


Patch Set 13:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/9452/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 13
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Mon, 13 Sep 2021 07:01:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-09-13 Thread liuyao (Code Review)
liuyao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17821 )

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..


Patch Set 13:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/17821/12//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17821/12//COMMIT_MSG@10
PS12, Line 10:
> nit. exceed the max line threshold.
Done


http://gerrit.cloudera.org:8080/#/c/17821/12/be/src/exec/aggregation-node.cc
File be/src/exec/aggregation-node.cc:

http://gerrit.cloudera.org:8080/#/c/17821/12/be/src/exec/aggregation-node.cc@76
PS12, Line 76:  VLOG_QUERY << Substitute("the number of rows ($0) returned from 
the aggregation"
 :   " node has exceeded the limit of $1", 
aggs_[0]->GetNumKeys(), limit());
> nit. May use Substitute() which is faster.
Done


http://gerrit.cloudera.org:8080/#/c/17821/12/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/17821/12/common/thrift/PlanNodes.thrift@479
PS12, Line 479: compl
> nit. complete
Done


http://gerrit.cloudera.org:8080/#/c/17821/12/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

http://gerrit.cloudera.org:8080/#/c/17821/12/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@646
PS12, Line 646: When both conditions below are true, aggregati
> nit. I think we should mention the two conditions in the commit message her
Done


http://gerrit.cloudera.org:8080/#/c/17821/12/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@648
PS12, Line 648: on n
> nit. May use Complete, as Halt implies stop for some reason.
Done


http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test:

http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test@2934
PS11, Line 2934: limit: 2
> Since we do not push down id from the outer side to the inner, I would thin
where id = subquery,If this subQuery returns 2 rows, we can sure that it is not 
meet the semantic requirement, we should report a semantic error.We don't need 
to get all the results.



--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 13
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Mon, 13 Sep 2021 06:40:40 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-09-13 Thread liuyao (Code Review)
Hello Qifan Chen, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17821

to look at the new patch set (#13).

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..

IMPALA-2581: LIMIT can be propagated down into some aggregations

This patch contains 2 parts:
1. When both conditions below are true, push down limit to
pre-aggregation
 a) aggregation node has no aggregate function
 b) aggregation node has no predicate
2. finish aggregation when number of unique keys of hash table has
exceeded the limit.

Sample queries:
SELECT DISTINCT f FROM t LIMIT n
Can pass the LIMIT all the way down to the pre-aggregation, which
leads to a nearly unbounded speedup on these queries in large tables
when n is low.

Testing:
Add test targeted-perf/queries/aggregation.test
Pass core test

Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
---
M be/src/exec/aggregation-node-base.cc
M be/src/exec/aggregation-node-base.h
M be/src/exec/aggregation-node.cc
M be/src/exec/aggregator.h
M be/src/exec/grouping-aggregator.cc
M be/src/exec/grouping-aggregator.h
M be/src/exec/non-grouping-aggregator.h
M be/src/exec/streaming-aggregation-node.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test
M testdata/workloads/functional-query/queries/QueryTest/spilling.test
M testdata/workloads/targeted-perf/queries/aggregation.test
18 files changed, 142 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/13
--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 13
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: liuyao 


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-09-10 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17821 )

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..


Patch Set 12:

(6 comments)

Looks great.

The comment on subquery rewrite test is orthogonal to this patch.

http://gerrit.cloudera.org:8080/#/c/17821/12//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17821/12//COMMIT_MSG@10
PS12, Line 10: pre-aggregation
nit. exceed the max line threshold.


http://gerrit.cloudera.org:8080/#/c/17821/12/be/src/exec/aggregation-node.cc
File be/src/exec/aggregation-node.cc:

http://gerrit.cloudera.org:8080/#/c/17821/12/be/src/exec/aggregation-node.cc@76
PS12, Line 76:  VLOG_QUERY << "the number of rows (" << aggs_[0]->GetNumKeys() 
<< ") returned"
 :   " from the aggregation node has exceeded the limit 
of " << limit();
nit. May use Substitute() which is faster.

Example:
orc-column-readers.cc:  return Substitute("$0 column (ORC id=$1)", 
node->toString(), node->getColumnId());

sort-node.cc:  VLOG(3) << Substitute("Sort estimation: 
estimated_full_input_size=$0 "


http://gerrit.cloudera.org:8080/#/c/17821/12/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/17821/12/common/thrift/PlanNodes.thrift@479
PS12, Line 479: halt
nit. complete


http://gerrit.cloudera.org:8080/#/c/17821/12/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

http://gerrit.cloudera.org:8080/#/c/17821/12/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@646
PS12, Line 646: For SELECT DISTINCT f1,f2,...fn FROM t LIMIT n
nit. I think we should mention the two conditions in the commit message here. 
The above SQL is an example.


http://gerrit.cloudera.org:8080/#/c/17821/12/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@648
PS12, Line 648: Halt
nit. May use Complete, as Halt implies stop for some reason.


http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test:

http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test@2934
PS11, Line 2934: limit: 2
> This is an SQL optimization. Look at SubqueryRewriter# mergeExpr. This is a
Since we do not push down id from the outer side to the inner, I would think we 
need to have ALL distinct values from the union side.

If limit 2 is for the # of groups (i.e., taking only 2 groups, or 2 distinct 
values), then is it right?



--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 12
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Fri, 10 Sep 2021 13:40:16 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-09-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17821 )

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..


Patch Set 12:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/9448/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 12
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Fri, 10 Sep 2021 10:01:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-09-10 Thread liuyao (Code Review)
liuyao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17821 )

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..


Patch Set 12:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/17821/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17821/11//COMMIT_MSG@10
PS11, Line 10: When both con
> nit. When both conditions below are true,
Done


http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/aggregation-node-base.h
File be/src/exec/aggregation-node-base.h:

http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/aggregation-node-base.h@71
PS11, Line 71:   bool fast_limit_check_ = false;
> nit. May init to false here.
Done


http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/aggregation-node-base.cc
File be/src/exec/aggregation-node-base.cc:

http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/aggregation-node-base.cc@93
PS11, Line 93:
 : Status AggregationNodeBase::Prepare(RuntimeState* state) {
 :   SCOPED_TIMER(runtime_profile_->total_time_counter());
 :   RETURN_IF_ERROR(ExecNode::Prepare(state));
 :   for (auto& agg : aggs_) {
 : agg->SetDebugOptions
> I also wonder if fast_limit_check_ can be moved to the plan node for Aggreg
Done Only GroupingAggregator has limit, maybe limit should not be added to 
TAggregator


http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/aggregation-node.cc
File be/src/exec/aggregation-node.cc:

http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/aggregation-node.cc@75
PS11, Line 75: os = true;
 :   VLOG_QUERY << "the
> nit. May make the message more specific by plugging in the numbers. For exa
Done


http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/streaming-aggregation-node.cc
File be/src/exec/streaming-aggregation-node.cc:

http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/streaming-aggregation-node.cc@129
PS11, Line 129: DCHECK(limit() > -1);
> nit. We probably should DCHECK(limit() > -1) here before the IF.
Done


http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/streaming-aggregation-node.cc@133
PS11, Line 133: child_batch_->Reset();
  :   VLOG_QUERY << "the
> nit. same as the other one to make the message more specific.
Done


http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test:

http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test@2934
PS11, Line 2934: limit: 2
> nit. I wonder if this correct. On paper, we need to look at all distinct va
This is an SQL optimization. Look at SubqueryRewriter# mergeExpr. This is a 
scalar subquery, the number of rows returned by this subquery cannot be greater 
than 1, so set limit = 2, and then check cardinality of result of subquery


http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/targeted-perf/queries/aggregation.test
File testdata/workloads/targeted-perf/queries/aggregation.test:

http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/targeted-perf/queries/aggregation.test@2731
PS11, Line 2731:
> This violates condition 1?
Done



--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 12
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Fri, 10 Sep 2021 09:39:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-09-10 Thread liuyao (Code Review)
Hello Qifan Chen, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17821

to look at the new patch set (#12).

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..

IMPALA-2581: LIMIT can be propagated down into some aggregations

This patch contains 2 parts:
1. When both conditions below are true, push down limit to pre-aggregation
 a) aggregation node has no aggregate function
 b) aggregation node has no predicate
2. finish aggregation when number of unique keys of hash table has
exceeded the limit.

Sample queries:
SELECT DISTINCT f FROM t LIMIT n
Can pass the LIMIT all the way down to the pre-aggregation, which
leads to a nearly unbounded speedup on these queries in large tables
when n is low.

Testing:
Add test targeted-perf/queries/aggregation.test
Pass core test

Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
---
M be/src/exec/aggregation-node-base.cc
M be/src/exec/aggregation-node-base.h
M be/src/exec/aggregation-node.cc
M be/src/exec/aggregator.h
M be/src/exec/grouping-aggregator.cc
M be/src/exec/grouping-aggregator.h
M be/src/exec/non-grouping-aggregator.h
M be/src/exec/streaming-aggregation-node.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test
M testdata/workloads/functional-query/queries/QueryTest/spilling.test
M testdata/workloads/targeted-perf/queries/aggregation.test
18 files changed, 140 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/12
--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 12
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-09-09 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17821 )

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..


Patch Set 11:

(8 comments)

Looks pretty good.

http://gerrit.cloudera.org:8080/#/c/17821/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17821/11//COMMIT_MSG@10
PS11, Line 10: In some cases
nit. When both conditions below are true,


http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/aggregation-node-base.h
File be/src/exec/aggregation-node-base.h:

http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/aggregation-node-base.h@71
PS11, Line 71:   bool fast_limit_check_;
nit. May init to false here.


http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/aggregation-node-base.cc
File be/src/exec/aggregation-node-base.cc:

http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/aggregation-node-base.cc@93
PS11, Line 93: if (num_aggs == 1 && limit() != -1 && 
pnode.aggs_[0]->GetNumGroupingExprs() > 0
 :   && pnode.aggs_[0]->aggregate_functions_.empty()
 :   && pnode.aggs_[0]->conjuncts_.empty()) {
 : fast_limit_check_ = true;
 : GroupingAggregator* gpAgg = 
dynamic_cast(aggs_[0].get());
 : gpAgg->UnsetLimit();
I also wonder if fast_limit_check_ can be moved to the plan node for 
Aggregation and set by FE.

The same comment goes to limit. That is, set the limit in TAggregator in FE.

In this way, the solution is cleaner.


http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/aggregation-node.cc
File be/src/exec/aggregation-node.cc:

http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/aggregation-node.cc@75
PS11, Line 75: LOG_QUERY << "the number of rows returned by agg node has 
exceeded"
 :   " the limit.";
nit. May make the message more specific by plugging in the numbers. For 
example, the number of rows (3) returned from the aggregation node has exceeded 
the limit of 2.


http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/streaming-aggregation-node.cc
File be/src/exec/streaming-aggregation-node.cc:

http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/streaming-aggregation-node.cc@129
PS11, Line 129: if (aggs_[0]->GetNumKeys() >= limit()) {
nit. We probably should DCHECK(limit() > -1) here before the IF.


http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/streaming-aggregation-node.cc@133
PS11, Line 133: VLOG_QUERY << "the number of rows returned by streaming agg 
node has exceeded"
  :   " the limit.";
nit. same as the other one to make the message more specific.


http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test:

http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test@2934
PS11, Line 2934: limit: 2
nit. I wonder if this correct. On paper, we need to look at all distinct values 
(via grouping) to make a decision.


http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/targeted-perf/queries/aggregation.test
File testdata/workloads/targeted-perf/queries/aggregation.test:

http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/targeted-perf/queries/aggregation.test@2731
PS11, Line 2731: count(*)
This violates condition 1?



--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 11
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 09 Sep 2021 15:08:25 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-09-09 Thread liuyao (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17821

to look at the new patch set (#11).

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..

IMPALA-2581: LIMIT can be propagated down into some aggregations

This patch contains 2 parts:
1. In some cases,  push down limit to pre-aggregation
 a) aggregation node has no aggregate function
 b) aggregation node has no predicate
2. finish aggregation when number of unique keys of hash table has
exceeded the limit.

Sample queries:
SELECT DISTINCT f FROM t LIMIT n
Can pass the LIMIT all the way down to the pre-aggregation, which
leads to a nearly unbounded speedup on these queries in large tables
when n is low.

Testing:
Add test targeted-perf/queries/aggregation.test
Pass core test

Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
---
M be/src/exec/aggregation-node-base.cc
M be/src/exec/aggregation-node-base.h
M be/src/exec/aggregation-node.cc
M be/src/exec/aggregator.h
M be/src/exec/grouping-aggregator.cc
M be/src/exec/grouping-aggregator.h
M be/src/exec/non-grouping-aggregator.h
M be/src/exec/streaming-aggregation-node.cc
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test
M testdata/workloads/functional-query/queries/QueryTest/spilling.test
M testdata/workloads/targeted-perf/queries/aggregation.test
16 files changed, 129 insertions(+), 25 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/11
--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 11
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-09-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17821 )

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..


Patch Set 10: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 10
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 02 Sep 2021 15:50:16 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-09-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17821 )

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..


Patch Set 10:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7452/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 10
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 02 Sep 2021 09:37:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-09-02 Thread liuyao (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17821

to look at the new patch set (#10).

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..

IMPALA-2581: LIMIT can be propagated down into some aggregations

This patch contains 2 parts:
1. In some cases,  push down limit to pre-aggregation
 a) aggregation node has no aggregate function
 b) aggregation node has no predicate
2. finish aggregation when number of unique keys of hash table has
exceeded the limit.

Queries like

SELECT DISTINCT f FROM t LIMIT n

Can pass the LIMIT all the way down to the pre-aggregation, which
leads to a nearly unbounded speedup on these queries in large tables
when n is low.

Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
---
M be/src/exec/aggregation-node-base.cc
M be/src/exec/aggregation-node-base.h
M be/src/exec/aggregation-node.cc
M be/src/exec/aggregator.h
M be/src/exec/grouping-aggregator.cc
M be/src/exec/grouping-aggregator.h
M be/src/exec/non-grouping-aggregator.h
M be/src/exec/streaming-aggregation-node.cc
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test
M testdata/workloads/functional-query/queries/QueryTest/spilling.test
M testdata/workloads/targeted-perf/queries/aggregation.test
16 files changed, 129 insertions(+), 25 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/10
--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 10
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-08-31 Thread liuyao (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17821

to look at the new patch set (#3).

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..

IMPALA-2581: LIMIT can be propagated down into some aggregations

This patch does two things:
1. In some cases,  push down limit to pre-aggregation
 a) aggregation node has no aggregate function
 b) aggregation node has no predicate
2. finish aggregation when number of unique keys of hash table has
exceeded the limit.

Queries like

SELECT DISTINCT f FROM t LIMIT n

Can pass the LIMIT all the way down to the pre-aggregation, which
leads to a nearly unbounded speedup on these queries in large tables
when n is low.

Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
---
M be/src/exec/aggregation-node-base.cc
M be/src/exec/aggregation-node-base.h
M be/src/exec/aggregation-node.cc
M be/src/exec/aggregator.h
M be/src/exec/grouping-aggregator.cc
M be/src/exec/grouping-aggregator.h
M be/src/exec/non-grouping-aggregator.h
M be/src/exec/streaming-aggregation-node.cc
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test
M testdata/workloads/functional-query/queries/QueryTest/spilling.test
M testdata/workloads/targeted-perf/queries/aggregation.test
17 files changed, 128 insertions(+), 24 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/3
--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 3
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-08-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17821 )

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17821/2/be/src/exec/aggregation-node-base.cc
File be/src/exec/aggregation-node-base.cc:

http://gerrit.cloudera.org:8080/#/c/17821/2/be/src/exec/aggregation-node-base.cc@91
PS2, Line 91:   // node has exceeded the limit we can complete the query 
without calculating all the data
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/17821/2/be/src/exec/grouping-aggregator.h
File be/src/exec/grouping-aggregator.h:

http://gerrit.cloudera.org:8080/#/c/17821/2/be/src/exec/grouping-aggregator.h@217
PS2, Line 217:   void UnsetLimit() { limit_ = -1; }
line has trailing whitespace



--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 2
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 31 Aug 2021 07:26:14 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-08-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17821 )

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7444/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 2
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 31 Aug 2021 07:25:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-08-31 Thread liuyao (Code Review)
liuyao has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17821


Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..

IMPALA-2581: LIMIT can be propagated down into some aggregations

This patch does two things:
1. In some cases,  push down limit to pre-aggregation
 a) aggregation node has no aggregate function
 b) aggregation node has no predicate
2. finish aggregation when number of unique keys of hash table has
exceeded the limit.

Queries like

SELECT DISTINCT f FROM t LIMIT n

Can pass the LIMIT all the way down to the pre-aggregation, which
leads to a nearly unbounded speedup on these queries in large tables
when n is low.

Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
---
M be/src/exec/aggregation-node-base.cc
M be/src/exec/aggregation-node-base.h
M be/src/exec/aggregation-node.cc
M be/src/exec/aggregator.h
M be/src/exec/grouping-aggregator.cc
M be/src/exec/grouping-aggregator.h
M be/src/exec/non-grouping-aggregator.h
M be/src/exec/streaming-aggregation-node.cc
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test
M testdata/workloads/functional-query/queries/QueryTest/spilling.test
M testdata/workloads/targeted-perf/queries/aggregation.test
17 files changed, 127 insertions(+), 24 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/2
--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 2
Gerrit-Owner: liuyao