[Impala-ASF-CR] IMPALA-6822: Add a query option to control shuffling by distinct exprs

2018-04-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/9949 )

Change subject: IMPALA-6822: Add a query option to control shuffling by 
distinct exprs
..

IMPALA-6822: Add a query option to control shuffling by distinct exprs

IMPALA-4794 changed the distinct aggregation behavior to shuffling by
both grouping exprs and the distinct expr. It's slower in queries
where the NDVs of grouping exprs are high and data are uniformly
distributed among groups. This patch adds a query option controlling
this behavior, letting users switch to the old plan.

Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc
Reviewed-on: http://gerrit.cloudera.org:8080/9949
Reviewed-by: Tianyi Wang 
Tested-by: Impala Public Jenkins 
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/shuffle-by-distinct-exprs.test
M testdata/workloads/functional-query/queries/QueryTest/distinct.test
M tests/query_test/test_aggregation.py
9 files changed, 454 insertions(+), 24 deletions(-)

Approvals:
  Tianyi Wang: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/9949
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc
Gerrit-Change-Number: 9949
Gerrit-PatchSet: 8
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-6822: Add a query option to control shuffling by distinct exprs

2018-04-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9949 )

Change subject: IMPALA-6822: Add a query option to control shuffling by 
distinct exprs
..


Patch Set 7: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/9949
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc
Gerrit-Change-Number: 9949
Gerrit-PatchSet: 7
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Thu, 12 Apr 2018 22:01:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6822: Add a query option to control shuffling by distinct exprs

2018-04-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9949 )

Change subject: IMPALA-6822: Add a query option to control shuffling by 
distinct exprs
..


Patch Set 7:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2298/


--
To view, visit http://gerrit.cloudera.org:8080/9949
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc
Gerrit-Change-Number: 9949
Gerrit-PatchSet: 7
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Thu, 12 Apr 2018 18:15:07 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6822: Add a query option to control shuffling by distinct exprs

2018-04-12 Thread Tianyi Wang (Code Review)
Tianyi Wang has uploaded a new patch set (#7). ( 
http://gerrit.cloudera.org:8080/9949 )

Change subject: IMPALA-6822: Add a query option to control shuffling by 
distinct exprs
..

IMPALA-6822: Add a query option to control shuffling by distinct exprs

IMPALA-4794 changed the distinct aggregation behavior to shuffling by
both grouping exprs and the distinct expr. It's slower in queries
where the NDVs of grouping exprs are high and data are uniformly
distributed among groups. This patch adds a query option controlling
this behavior, letting users switch to the old plan.

Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/shuffle-by-distinct-exprs.test
M testdata/workloads/functional-query/queries/QueryTest/distinct.test
M tests/query_test/test_aggregation.py
9 files changed, 454 insertions(+), 24 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/9949/7
--
To view, visit http://gerrit.cloudera.org:8080/9949
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc
Gerrit-Change-Number: 9949
Gerrit-PatchSet: 7
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-6822: Add a query option to control shuffling by distinct exprs

2018-04-12 Thread Tianyi Wang (Code Review)
Tianyi Wang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9949 )

Change subject: IMPALA-6822: Add a query option to control shuffling by 
distinct exprs
..


Patch Set 7: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/9949
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc
Gerrit-Change-Number: 9949
Gerrit-PatchSet: 7
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Thu, 12 Apr 2018 18:14:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6822: Add a query option to control shuffling by distinct exprs

2018-04-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9949 )

Change subject: IMPALA-6822: Add a query option to control shuffling by 
distinct exprs
..


Patch Set 6: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/2287/


--
To view, visit http://gerrit.cloudera.org:8080/9949
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc
Gerrit-Change-Number: 9949
Gerrit-PatchSet: 6
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Comment-Date: Thu, 12 Apr 2018 01:39:41 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6822: Add a query option to control shuffling by distinct exprs

2018-04-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9949 )

Change subject: IMPALA-6822: Add a query option to control shuffling by 
distinct exprs
..


Patch Set 6:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2287/


--
To view, visit http://gerrit.cloudera.org:8080/9949
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc
Gerrit-Change-Number: 9949
Gerrit-PatchSet: 6
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Comment-Date: Wed, 11 Apr 2018 21:13:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6822: Add a query option to control shuffling by distinct exprs

2018-04-11 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9949 )

Change subject: IMPALA-6822: Add a query option to control shuffling by 
distinct exprs
..


Patch Set 6: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/9949
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc
Gerrit-Change-Number: 9949
Gerrit-PatchSet: 6
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Comment-Date: Wed, 11 Apr 2018 20:54:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6822: Add a query option to control shuffling by distinct exprs

2018-04-10 Thread Tianyi Wang (Code Review)
Tianyi Wang has uploaded a new patch set (#6). ( 
http://gerrit.cloudera.org:8080/9949 )

Change subject: IMPALA-6822: Add a query option to control shuffling by 
distinct exprs
..

IMPALA-6822: Add a query option to control shuffling by distinct exprs

IMPALA-4794 changed the distinct aggregation behavior to shuffling by
both grouping exprs and the distinct expr. It's slower in queries
where the NDVs of grouping exprs are high and data are uniformly
distributed among groups. This patch adds a query option controlling
this behavior, letting users switch to the old plan.

Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/shuffle-by-distinct-exprs.test
M testdata/workloads/functional-query/queries/QueryTest/distinct.test
M tests/query_test/test_aggregation.py
9 files changed, 454 insertions(+), 24 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/9949/6
--
To view, visit http://gerrit.cloudera.org:8080/9949
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc
Gerrit-Change-Number: 9949
Gerrit-PatchSet: 6
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tianyi Wang 


[Impala-ASF-CR] IMPALA-6822: Add a query option to control shuffling by distinct exprs

2018-04-09 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9949 )

Change subject: IMPALA-6822: Add a query option to control shuffling by 
distinct exprs
..


Patch Set 5:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/9949/5/be/src/service/query-options.h
File be/src/service/query-options.h:

http://gerrit.cloudera.org:8080/#/c/9949/5/be/src/service/query-options.h@132
PS5, Line 132:   QUERY_OPT_FN(shuffle_distinct_exprs, SHUFFLE_DISTINCT_EXPRS, 
TQueryOptionLevel::REGULAR)\
This seems similarly ADVANCED as, e.g., DEFAULT_JOIN_DISTRIBUTION_MODE so let's 
move to ADVANCED.


http://gerrit.cloudera.org:8080/#/c/9949/5/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/9949/5/be/src/service/query-options.cc@635
PS5, Line 635:   case TImpalaQueryOptions::SHUFFLE_DISTINCT_EXPRS: {
I just realized we should also update the query-options-test.cc


http://gerrit.cloudera.org:8080/#/c/9949/5/fe/src/test/java/org/apache/impala/planner/PlannerTest.java
File fe/src/test/java/org/apache/impala/planner/PlannerTest.java:

http://gerrit.cloudera.org:8080/#/c/9949/5/fe/src/test/java/org/apache/impala/planner/PlannerTest.java@78
PS5, Line 78:   public void testNoShuffleOnDistinct() {
testShuffleByDistinctExprs()


http://gerrit.cloudera.org:8080/#/c/9949/5/testdata/workloads/functional-planner/queries/PlannerTest/shuffle-by-distinct-exprs.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/shuffle-by-distinct-exprs.test:

http://gerrit.cloudera.org:8080/#/c/9949/5/testdata/workloads/functional-planner/queries/PlannerTest/shuffle-by-distinct-exprs.test@2
PS5, Line 2:  QUERY
Is the formatting of these tests correct? I believe we don't recognize " 
QUERY in planner tests" Or perhaps this works by chance? Other tests don't have 
this better to be consistent.


http://gerrit.cloudera.org:8080/#/c/9949/5/testdata/workloads/functional-planner/queries/PlannerTest/shuffle-by-distinct-exprs.test@217
PS5, Line 217: select count(distinct a.int_col) from functional.alltypes a 
inner join [shuffle]
Add one more test where the input is partitioned by year, int_col (compatible 
with the desired partitioning with distinct exprs)


http://gerrit.cloudera.org:8080/#/c/9949/5/tests/query_test/test_aggregation.py
File tests/query_test/test_aggregation.py:

http://gerrit.cloudera.org:8080/#/c/9949/5/tests/query_test/test_aggregation.py@336
PS5, Line 336: class TestDistinctAggregationQueries(ImpalaTestSuite):
TestDistinctAggregation


http://gerrit.cloudera.org:8080/#/c/9949/5/tests/query_test/test_aggregation.py@353
PS5, Line 353: if cls.exploration_strategy() == 'core':
Aren't we adding all formats in L345? I think we need to add a constraint here 
and not a new dimension.



--
To view, visit http://gerrit.cloudera.org:8080/9949
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc
Gerrit-Change-Number: 9949
Gerrit-PatchSet: 5
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Comment-Date: Tue, 10 Apr 2018 03:49:48 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6822: Add a query option to control shuffling by distinct exprs

2018-04-09 Thread Tianyi Wang (Code Review)
Tianyi Wang has uploaded a new patch set (#5). ( 
http://gerrit.cloudera.org:8080/9949 )

Change subject: IMPALA-6822: Add a query option to control shuffling by 
distinct exprs
..

IMPALA-6822: Add a query option to control shuffling by distinct exprs

IMPALA-4794 changed the distinct aggregation behavior to shuffling by
both grouping exprs and the distinct expr. It's slower in queries
where the NDVs of grouping exprs are high and data are uniformly
distributed among groups. This patch adds a query option controlling
this behavior, letting users switch to the old plan.

Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/shuffle-by-distinct-exprs.test
M testdata/workloads/functional-query/queries/QueryTest/distinct.test
M tests/query_test/test_aggregation.py
9 files changed, 371 insertions(+), 24 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/9949/5
--
To view, visit http://gerrit.cloudera.org:8080/9949
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc
Gerrit-Change-Number: 9949
Gerrit-PatchSet: 5
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tianyi Wang 


[Impala-ASF-CR] IMPALA-6822: Add a query option to control shuffling by distinct exprs

2018-04-09 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9949 )

Change subject: IMPALA-6822: Add a query option to control shuffling by 
distinct exprs
..


Patch Set 4:

(3 comments)

Looking closely into the tests now.

http://gerrit.cloudera.org:8080/#/c/9949/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/9949/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@889
PS3, Line 889: // grouping exprs in the second phase which is not required 
when omitting the distinct
> What do you mean by separating them? I suppose we always shuffle if there a
Nvm, current code is ok.


http://gerrit.cloudera.org:8080/#/c/9949/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@931
PS3, Line 931:   // phase-1 merge step.
> L929 adds phase-2 agg. Partition being compatible && no shuffle by distinct
Got it, for some reason I missed L929.


http://gerrit.cloudera.org:8080/#/c/9949/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@952
PS3, Line 952:   // step (which is where it should be)
> We don't need "fragments.add(firstMergeFragment)" if the partition is compa
Makes sense. Let's the two paths more uniform with:

if (!shuffleDistinctExpr) return firstMergeFragment;
fragments.add(firstMergeFragment);



--
To view, visit http://gerrit.cloudera.org:8080/9949
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc
Gerrit-Change-Number: 9949
Gerrit-PatchSet: 4
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Comment-Date: Mon, 09 Apr 2018 22:37:03 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6822: Add a query option to control shuffling by distinct exprs

2018-04-09 Thread Tianyi Wang (Code Review)
Tianyi Wang has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/9949 )

Change subject: IMPALA-6822: Add a query option to control shuffling by 
distinct exprs
..

IMPALA-6822: Add a query option to control shuffling by distinct exprs

IMPALA-4794 changed the distinct aggregation behavior to shuffling by
both grouping exprs and the distinct expr. It's slower in queries
where the NDVs of grouping exprs are high and data are uniformly
distributed among groups. This patch adds a query option controlling
this behavior, letting users switch to the old plan.

Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/no-shuffle-by-distinct.test
7 files changed, 173 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/9949/4
--
To view, visit http://gerrit.cloudera.org:8080/9949
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc
Gerrit-Change-Number: 9949
Gerrit-PatchSet: 4
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Alex Behm 


[Impala-ASF-CR] IMPALA-6822: Add a query option to control shuffling by distinct exprs

2018-04-09 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9949 )

Change subject: IMPALA-6822: Add a query option to control shuffling by 
distinct exprs
..


Patch Set 3:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/9949/3/be/src/service/query-options.h
File be/src/service/query-options.h:

http://gerrit.cloudera.org:8080/#/c/9949/3/be/src/service/query-options.h@44
PS3, Line 44:   TImpalaQueryOptions::SHUFFLE_DISTINCT_EXPR + 1);\
Let's call this SHUFFLE_DISTINCT_EXPRS because there could be several such exprs


http://gerrit.cloudera.org:8080/#/c/9949/3/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:

http://gerrit.cloudera.org:8080/#/c/9949/3/common/thrift/ImpalaInternalService.thrift@276
PS3, Line 276:   // When a query has both grouping exprs and a distinct expr, 
impala can shuffle by
* both grouping and distinct exprs
* Impala can optionally include the distinct exprs in the hash exchange of the 
first aggregation phase to spread the data among more nodes. However, this plan 
requires another  hash exchange on the grouping exprs in the second phase which 
is not required when when omitting the distinct exprs in the first phase. 
Shuffling by both is better if the grouping exprs have a low NDVs.


http://gerrit.cloudera.org:8080/#/c/9949/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/9949/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@886
PS3, Line 886: // When a query has both grouping exprs and a distinct expr, 
impala can shuffle by
same comments as in .thrift


http://gerrit.cloudera.org:8080/#/c/9949/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@889
PS3, Line 889: boolean shuffleDistinctExpr = 
ctx_.getQueryOptions().shuffle_distinct_expr ||
shuffleDistinctExprs

I think it's cleaner to separate the value of the query option from 
hasGroupingExprs


http://gerrit.cloudera.org:8080/#/c/9949/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@931
PS3, Line 931:   if (!shuffleDistinctExpr) return firstMergeFragment;
I don't understand why this is correct. If the first phase fragment is suitably 
partitioned then we still need to do the second phase. Where do we add the 
second phase aggregation if we return here?

Is this because shuffleDistinctExpr conflates whether there are grouping exprs 
or not and what the query option is set to?


http://gerrit.cloudera.org:8080/#/c/9949/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@952
PS3, Line 952:   if (shuffleDistinctExpr) {
Move this before L959 for better readability. Also you can just:

if (!shuffleDistinctExpr) return firstFragment;


http://gerrit.cloudera.org:8080/#/c/9949/3/testdata/workloads/functional-planner/queries/PlannerTest/no-shuffle-by-distinct.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/no-shuffle-by-distinct.test:

http://gerrit.cloudera.org:8080/#/c/9949/3/testdata/workloads/functional-planner/queries/PlannerTest/no-shuffle-by-distinct.test@1
PS3, Line 1: # distinct agg with grouping over a union
Let's try to distill minimal tests that are needed here.
* distinct with and without grouping
* first phase fragment is already suitably partitioned or not

Looks like we have a bunch of tests here with unnecessary stuff like unions, 
etc.

I'll spend some time looking over these and the existing tests, so we can come 
up with a lean and effective test suite.



--
To view, visit http://gerrit.cloudera.org:8080/9949
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc
Gerrit-Change-Number: 9949
Gerrit-PatchSet: 3
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Comment-Date: Mon, 09 Apr 2018 19:16:42 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6822: Add a query option to control shuffling by distinct exprs

2018-04-06 Thread Tianyi Wang (Code Review)
Tianyi Wang has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/9949 )

Change subject: IMPALA-6822: Add a query option to control shuffling by 
distinct exprs
..

IMPALA-6822: Add a query option to control shuffling by distinct exprs

IMPALA-4794 changed the distinct aggregation behavior to shuffling by
both grouping exprs and the distinct expr. It's slower in queries
where the NDVs of grouping exprs are high and data are uniformly
distributed among groups. This patch adds a query option controlling
this behavior, letting users switch to the old plan.

Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/no-shuffle-by-distinct.test
7 files changed, 856 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/9949/3
--
To view, visit http://gerrit.cloudera.org:8080/9949
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc
Gerrit-Change-Number: 9949
Gerrit-PatchSet: 3
Gerrit-Owner: Tianyi Wang 


[Impala-ASF-CR] IMPALA-6822: Add a query option to control shuffling by distinct exprs

2018-04-06 Thread Tianyi Wang (Code Review)
Tianyi Wang has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/9949 )

Change subject: IMPALA-6822: Add a query option to control shuffling by 
distinct exprs
..

IMPALA-6822: Add a query option to control shuffling by distinct exprs

IMPALA-4794 changed the distinct aggregation behavior to shuffling by
both grouping exprs and the distinct expr. It's slower in queries
where the NDVs of grouping exprs are high and data are uniformly
distributed among groups. This patch adds a query option controlling
this behavior, letting users switch to the old plan.

Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/no-shuffle-by-distinct.test
7 files changed, 856 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/9949/2
--
To view, visit http://gerrit.cloudera.org:8080/9949
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc
Gerrit-Change-Number: 9949
Gerrit-PatchSet: 2
Gerrit-Owner: Tianyi Wang 


[Impala-ASF-CR] IMPALA-6822: Add a query option to control shuffling by distinct exprs

2018-04-06 Thread Tianyi Wang (Code Review)
Tianyi Wang has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/9949


Change subject: IMPALA-6822: Add a query option to control shuffling by 
distinct exprs
..

IMPALA-6822: Add a query option to control shuffling by distinct exprs

IMPALA-4794 changed the distinct aggregation behavior to shuffling by
both grouping exprs and the distinct expr. It's slower in queries
where the NDVs of grouping exprs are high and data are uniformly
distributed among groups. This patch adds a query option controlling
this behavior, letting users switch to the old plan.

Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/no-shuffle-on-distinct.test
7 files changed, 856 insertions(+), 16 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/9949/1
--
To view, visit http://gerrit.cloudera.org:8080/9949
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc
Gerrit-Change-Number: 9949
Gerrit-PatchSet: 1
Gerrit-Owner: Tianyi Wang