Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9949 )

Change subject: IMPALA-6822: Add a query option to control shuffling by 
distinct exprs
......................................................................


Patch Set 3:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/9949/3/be/src/service/query-options.h
File be/src/service/query-options.h:

http://gerrit.cloudera.org:8080/#/c/9949/3/be/src/service/query-options.h@44
PS3, Line 44:       TImpalaQueryOptions::SHUFFLE_DISTINCT_EXPR + 1);\
Let's call this SHUFFLE_DISTINCT_EXPRS because there could be several such exprs


http://gerrit.cloudera.org:8080/#/c/9949/3/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:

http://gerrit.cloudera.org:8080/#/c/9949/3/common/thrift/ImpalaInternalService.thrift@276
PS3, Line 276:   // When a query has both grouping exprs and a distinct expr, 
impala can shuffle by
* both grouping and distinct exprs
* Impala can optionally include the distinct exprs in the hash exchange of the 
first aggregation phase to spread the data among more nodes. However, this plan 
requires another  hash exchange on the grouping exprs in the second phase which 
is not required when when omitting the distinct exprs in the first phase. 
Shuffling by both is better if the grouping exprs have a low NDVs.


http://gerrit.cloudera.org:8080/#/c/9949/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/9949/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@886
PS3, Line 886:     // When a query has both grouping exprs and a distinct expr, 
impala can shuffle by
same comments as in .thrift


http://gerrit.cloudera.org:8080/#/c/9949/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@889
PS3, Line 889:     boolean shuffleDistinctExpr = 
ctx_.getQueryOptions().shuffle_distinct_expr ||
shuffleDistinctExprs

I think it's cleaner to separate the value of the query option from 
hasGroupingExprs


http://gerrit.cloudera.org:8080/#/c/9949/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@931
PS3, Line 931:       if (!shuffleDistinctExpr) return firstMergeFragment;
I don't understand why this is correct. If the first phase fragment is suitably 
partitioned then we still need to do the second phase. Where do we add the 
second phase aggregation if we return here?

Is this because shuffleDistinctExpr conflates whether there are grouping exprs 
or not and what the query option is set to?


http://gerrit.cloudera.org:8080/#/c/9949/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@952
PS3, Line 952:       if (shuffleDistinctExpr) {
Move this before L959 for better readability. Also you can just:

if (!shuffleDistinctExpr) return firstFragment;


http://gerrit.cloudera.org:8080/#/c/9949/3/testdata/workloads/functional-planner/queries/PlannerTest/no-shuffle-by-distinct.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/no-shuffle-by-distinct.test:

http://gerrit.cloudera.org:8080/#/c/9949/3/testdata/workloads/functional-planner/queries/PlannerTest/no-shuffle-by-distinct.test@1
PS3, Line 1: # distinct agg with grouping over a union
Let's try to distill minimal tests that are needed here.
* distinct with and without grouping
* first phase fragment is already suitably partitioned or not

Looks like we have a bunch of tests here with unnecessary stuff like unions, 
etc.

I'll spend some time looking over these and the existing tests, so we can come 
up with a lean and effective test suite.



--
To view, visit http://gerrit.cloudera.org:8080/9949
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc
Gerrit-Change-Number: 9949
Gerrit-PatchSet: 3
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Comment-Date: Mon, 09 Apr 2018 19:16:42 +0000
Gerrit-HasComments: Yes

Reply via email to