Thomas Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10771 )

Change subject: IMPALA-110: Support for multiple DISTINCT
......................................................................


Patch Set 7:

(25 comments)

http://gerrit.cloudera.org:8080/#/c/10771/7/be/src/exec/aggregation-node-base.cc
File be/src/exec/aggregation-node-base.cc:

http://gerrit.cloudera.org:8080/#/c/10771/7/be/src/exec/aggregation-node-base.cc@85
PS7, Line 85: fast path to avoid going through all tuples
> optimization that bets that the index of non-null agg will be the same from
Done


http://gerrit.cloudera.org:8080/#/c/10771/6/be/src/exec/aggregator.h
File be/src/exec/aggregator.h:

http://gerrit.cloudera.org:8080/#/c/10771/6/be/src/exec/aggregator.h@64
PS6, Line 64: int agg_idx
> add a comment mentioning what this index is relative to
Done


http://gerrit.cloudera.org:8080/#/c/10771/6/be/src/exprs/valid-tuple-id.h
File be/src/exprs/valid-tuple-id.h:

http://gerrit.cloudera.org:8080/#/c/10771/6/be/src/exprs/valid-tuple-id.h@28
PS6, Line 28: /// Valid input rows must have exactly one non-NULL tuple.
> what happens if it doesn't?
In debug builds, we'll hit a dcheck. In release builds, we'll return the first 
non-null tuple.


http://gerrit.cloudera.org:8080/#/c/10771/6/be/src/runtime/row-batch.h
File be/src/runtime/row-batch.h:

http://gerrit.cloudera.org:8080/#/c/10771/6/be/src/runtime/row-batch.h@247
PS6, Line 247:     int RowNum() { return (row_ - parent_->tuple_ptrs_) / 
num_tuples_per_row_; }
> this can't be < 0 or produce a div zero error?
No, num_tuples_per_row_ cannot be 0, enforced by dchecks in the RowBatch 
constructor.


http://gerrit.cloudera.org:8080/#/c/10771/6/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/10771/6/common/thrift/PlanNodes.thrift@401
PS6, Line 401: replicate_input
> add a comment for this
Done


http://gerrit.cloudera.org:8080/#/c/10771/7/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/10771/7/common/thrift/PlanNodes.thrift@397
PS7, Line 397:
> pls add comments for this one (in particular, replicate_input)
Done


http://gerrit.cloudera.org:8080/#/c/10771/6/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java
File fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java:

http://gerrit.cloudera.org:8080/#/c/10771/6/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java@38
PS6, Line 38: A mix of distinct and non-distinct aggregation functions is 
allowed as long as all
            :  * distinct functions have the same distinct expressions.
> I did not understand this-- clarify with a counter-example.
Done


http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java
File fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java:

http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java@50
PS7, Line 50:  * - Only a non-distinct class:
> add an example, such as: max(a)
Done


http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java@52
PS7, Line 52:  * - One distinct class, and optionally a non-distinct class:
> example, count(distinct a)[, max(b)]
Done


http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java@56
PS7, Line 56:  * - Multiple distinct classes, and optionally a non-distinct 
class
> example: count(distinct a), count(distinct b)[, max(c)]
Done


http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java@233
PS7, Line 233: DistinctAggExprs.isEmpty
> ideally, this reads without negations... I see three there. perhaps hasNonD
Done


http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java@314
PS7, Line 314:    * - Result tuple id 1 having 3 slots with ids 3, 4, 5
> relate these to the columns/exprs in the example.
Done


http://gerrit.cloudera.org:8080/#/c/10771/6/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java:

http://gerrit.cloudera.org:8080/#/c/10771/6/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@66
PS6, Line 66: MultiAggregateInfo
> does a single aggregate here boil down to a single AggregateInfo?
Yes, if there is at most a single DISTINCT class, this MultiAggregateInfo will 
only contain a single AggregateInfo.


http://gerrit.cloudera.org:8080/#/c/10771/6/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@387
PS6, Line 387:       multiAggInfo_.materializeRequiredSlots(analyzer, 
baseTblSmap_);
> where is having dealt with now?
So I'm not entirely sure why having was special cased here before and no long 
needs to be (I think the special case here is just redundant and we would 
previously end up with the having pred present more than once in the list of 
exprs to materialize), but I've confirmed that having predicates are handled 
correctly:

We call Analyzer.registerConjunct on the having pred during 
analyzeAggregation(). Then, it will be returned by 
analyzer.getUnassignedConjuncts in multiAggInfo_.collectConjuncts


http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java:

http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@387
PS7, Line 387:       multiAggInfo_.materializeRequiredSlots(analyzer, 
baseTblSmap_);
> any idea why it works out here to not worry about the having clause here?
Answered on the other revision.


http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@268
PS7, Line 268: select
> nit: selects
Done


http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@286
PS7, Line 286: two
> with two classes in this example, also add where three case-when clauses ca
Done


http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@348
PS7, Line 348: create(0);
> this is used as a sentinel, so must differ from any non-sentinel. is the ex
This is just used in the hashing expression, so its value doesn't matter.

eg, in the example above, rows for the second agg class (distinct c) will be 
hashed according to 'd,c,0' so the same set of rows is hashed together in the 
exchange regardless of the value.

Perhaps the term 'sentinel' is confusing here, since it sort of means the value 
is used to indicate something, which isn't the case? I'll change it to 'dummy' 
value.


http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@480
PS7, Line 480: be
> nit: expand to backend (this just looked odd when quickly scanning).
Done


http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@a794
PS7, Line 794:
> any idea why this is removed? same question for methods below as well.
Just a mistake. I put it back.


http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java:

http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java@2804
PS7, Line 2804:   private void validateSingleColAppxCountDistinctRewrite(
> pls add comments for these.
Done


http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java:

http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@2160
PS7, Line 2160: powerSet
> nice
Done


http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java:

http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java@a61
PS7, Line 61:
> where'd these go?
This wasn't being used anywhere, and it wouldn't provide useful coverage 
anyways - the checks that are done here are already enforced by Preconditions 
in SelectStmt/BinaryPredicate.toThrift


http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java@a138
PS7, Line 138:
> similarly, why is this dropped?
same as above


http://gerrit.cloudera.org:8080/#/c/10771/7/testdata/workloads/functional-query/queries/QueryTest/spilling-aggs.test
File testdata/workloads/functional-query/queries/QueryTest/spilling-aggs.test:

http://gerrit.cloudera.org:8080/#/c/10771/7/testdata/workloads/functional-query/queries/QueryTest/spilling-aggs.test@192
PS7, Line 192: # Multiple distinct and non-distinct, with an intermediate tuple 
(avg)
> add a variant of this with a group by
Done



--
To view, visit http://gerrit.cloudera.org:8080/10771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I055402eaef6d81e5f70e850d9f8a621e766830a4
Gerrit-Change-Number: 10771
Gerrit-PatchSet: 7
Gerrit-Owner: Thomas Marshall <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Thomas Marshall <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Vuk Ercegovac <[email protected]>
Gerrit-Comment-Date: Mon, 27 Aug 2018 23:12:52 +0000
Gerrit-HasComments: Yes

Reply via email to