Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10771 )

Change subject: IMPALA-110: Support for multiple DISTINCT
......................................................................


Patch Set 7:

(25 comments)

comments from initial pass. still looking at several backend classes, some of 
the tests, and multi-aggregate info.

http://gerrit.cloudera.org:8080/#/c/10771/7/be/src/exec/aggregation-node-base.cc
File be/src/exec/aggregation-node-base.cc:

http://gerrit.cloudera.org:8080/#/c/10771/7/be/src/exec/aggregation-node-base.cc@85
PS7, Line 85: fast path to avoid going through all tuples
optimization that bets that the index of non-null agg will be the same from one 
tuple to the next.


http://gerrit.cloudera.org:8080/#/c/10771/6/be/src/exec/aggregator.h
File be/src/exec/aggregator.h:

http://gerrit.cloudera.org:8080/#/c/10771/6/be/src/exec/aggregator.h@64
PS6, Line 64: int agg_idx
add a comment mentioning what this index is relative to


http://gerrit.cloudera.org:8080/#/c/10771/6/be/src/exprs/valid-tuple-id.h
File be/src/exprs/valid-tuple-id.h:

http://gerrit.cloudera.org:8080/#/c/10771/6/be/src/exprs/valid-tuple-id.h@28
PS6, Line 28: /// Valid input rows must have exactly one non-NULL tuple.
what happens if it doesn't?


http://gerrit.cloudera.org:8080/#/c/10771/6/be/src/runtime/row-batch.h
File be/src/runtime/row-batch.h:

http://gerrit.cloudera.org:8080/#/c/10771/6/be/src/runtime/row-batch.h@247
PS6, Line 247:     int RowNum() { return (row_ - parent_->tuple_ptrs_) / 
num_tuples_per_row_; }
this can't be < 0 or produce a div zero error?


http://gerrit.cloudera.org:8080/#/c/10771/6/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/10771/6/common/thrift/PlanNodes.thrift@401
PS6, Line 401: replicate_input
add a comment for this


http://gerrit.cloudera.org:8080/#/c/10771/7/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/10771/7/common/thrift/PlanNodes.thrift@397
PS7, Line 397:
pls add comments for this one (in particular, replicate_input)


http://gerrit.cloudera.org:8080/#/c/10771/6/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java
File fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java:

http://gerrit.cloudera.org:8080/#/c/10771/6/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java@38
PS6, Line 38: A mix of distinct and non-distinct aggregation functions is 
allowed as long as all
            :  * distinct functions have the same distinct expressions.
I did not understand this-- clarify with a counter-example.


http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java
File fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java:

http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java@50
PS7, Line 50:  * - Only a non-distinct class:
add an example, such as: max(a)


http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java@52
PS7, Line 52:  * - One distinct class, and optionally a non-distinct class:
example, count(distinct a)[, max(b)]


http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java@56
PS7, Line 56:  * - Multiple distinct classes, and optionally a non-distinct 
class
example: count(distinct a), count(distinct b)[, max(c)]


http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java@233
PS7, Line 233: DistinctAggExprs.isEmpty
ideally, this reads without negations... I see three there. perhaps 
hasNonDistinctAggExprs?


http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java@314
PS7, Line 314:    * - Result tuple id 1 having 3 slots with ids 3, 4, 5
relate these to the columns/exprs in the example.


http://gerrit.cloudera.org:8080/#/c/10771/6/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java:

http://gerrit.cloudera.org:8080/#/c/10771/6/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@66
PS6, Line 66: MultiAggregateInfo
does a single aggregate here boil down to a single AggregateInfo?


http://gerrit.cloudera.org:8080/#/c/10771/6/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@387
PS6, Line 387:       multiAggInfo_.materializeRequiredSlots(analyzer, 
baseTblSmap_);
where is having dealt with now?


http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java:

http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@387
PS7, Line 387:       multiAggInfo_.materializeRequiredSlots(analyzer, 
baseTblSmap_);
any idea why it works out here to not worry about the having clause here?


http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@268
PS7, Line 268: select
nit: selects


http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@286
PS7, Line 286: two
with two classes in this example, also add where three case-when clauses came 
from.


http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@348
PS7, Line 348: create(0);
this is used as a sentinel, so must differ from any non-sentinel. is the 
expression used for this purpose (e.g., murmur_hash vs. numeric literal) or its 
evaluated value (e.g., eval(murmur_hash(arg)) vs 0)? if its the evaluated 
value, is there ambiguity?


http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@480
PS7, Line 480: be
nit: expand to backend (this just looked odd when quickly scanning).


http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@a794
PS7, Line 794:
any idea why this is removed? same question for methods below as well.


http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java:

http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java@2804
PS7, Line 2804:   private void validateSingleColAppxCountDistinctRewrite(
pls add comments for these.


http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java:

http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@2160
PS7, Line 2160: powerSet
nice


http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java:

http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java@a61
PS7, Line 61:
where'd these go?


http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java@a138
PS7, Line 138:
similarly, why is this dropped?


http://gerrit.cloudera.org:8080/#/c/10771/7/testdata/workloads/functional-query/queries/QueryTest/spilling-aggs.test
File testdata/workloads/functional-query/queries/QueryTest/spilling-aggs.test:

http://gerrit.cloudera.org:8080/#/c/10771/7/testdata/workloads/functional-query/queries/QueryTest/spilling-aggs.test@192
PS7, Line 192: # Multiple distinct and non-distinct, with an intermediate tuple 
(avg)
add a variant of this with a group by



--
To view, visit http://gerrit.cloudera.org:8080/10771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I055402eaef6d81e5f70e850d9f8a621e766830a4
Gerrit-Change-Number: 10771
Gerrit-PatchSet: 7
Gerrit-Owner: Thomas Marshall <thomasmarsh...@cmu.edu>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <thomasmarsh...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <vercego...@cloudera.com>
Gerrit-Comment-Date: Fri, 24 Aug 2018 17:05:34 +0000
Gerrit-HasComments: Yes

Reply via email to