[Impala-ASF-CR] IMPALA-110: Support for multiple DISTINCT

Tim Armstrong (Code Review) Fri, 17 Aug 2018 13:32:20 -0700

Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10771 )


Change subject: IMPALA-110: Support for multiple DISTINCT
......................................................................


Patch Set 5:

(13 comments)

I'm a +2 on the backend part. I took a look at the frontend and I think the 
high-level structure makes sense. The predicate assignment and smaps logic is a 
bit beyond me but I could invest some time in understanding that better.

http://gerrit.cloudera.org:8080/#/c/10771/4/be/src/exec/streaming-aggregation-node.cc
File be/src/exec/streaming-aggregation-node.cc:

http://gerrit.cloudera.org:8080/#/c/10771/4/be/src/exec/streaming-aggregation-node.cc@139
PS4, Line 139:       continue;
> I'm not sure that its that much clearer, and the logic isn't quite the same
Seems ok, thanks for explaining.


http://gerrit.cloudera.org:8080/#/c/10771/5/be/src/exec/streaming-aggregation-node.cc
File be/src/exec/streaming-aggregation-node.cc:

http://gerrit.cloudera.org:8080/#/c/10771/5/be/src/exec/streaming-aggregation-node.cc@158
PS5, Line 158: This
nit: caps


http://gerrit.cloudera.org:8080/#/c/10771/5/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java
File fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java:

http://gerrit.cloudera.org:8080/#/c/10771/5/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java@36
PS5, Line 36:  * Encapsulates all the information needed to compute a list of 
aggregate functions with
It might be worth explaining how this relates to MultiAggregateInfo - why it is 
separate (we use a different strategy for generating plans).


http://gerrit.cloudera.org:8080/#/c/10771/5/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java@38
PS5, Line 38:  * A mix distinct and non-distinct aggregation functions is 
allowed as long as all
nit: mix of


http://gerrit.cloudera.org:8080/#/c/10771/5/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java@205
PS5, Line 205:         throw new AnalysisException(
Is this reachable now? Can it be a precondition?

I don't see this error message in a test so either it's not reachable or we're 
missing test coverage.


http://gerrit.cloudera.org:8080/#/c/10771/5/fe/src/main/java/org/apache/impala/analysis/AggregateInfoBase.java
File fe/src/main/java/org/apache/impala/analysis/AggregateInfoBase.java:

http://gerrit.cloudera.org:8080/#/c/10771/5/fe/src/main/java/org/apache/impala/analysis/AggregateInfoBase.java@33
PS5, Line 33:  * Base class for AggregateInfo and AnalyticInfo containing the 
intermediate and output
Update?


http://gerrit.cloudera.org:8080/#/c/10771/5/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java
File fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java:

http://gerrit.cloudera.org:8080/#/c/10771/5/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java@40
PS5, Line 40:  * SELECT block including their distributed execution.
Might be worth explaining more directly the relationship to AggregateInfo up 
the top.


http://gerrit.cloudera.org:8080/#/c/10771/5/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java@366
PS5, Line 366: getTranspseGroupingExprs
typo: transpose


http://gerrit.cloudera.org:8080/#/c/10771/5/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java@521
PS5, Line 521:     ExprSubstitutionMap result = new 
ExprSubstitutionMap(lhsExprs, rhsExprs);
nit: unnecessary intermediate variable 'result'


http://gerrit.cloudera.org:8080/#/c/10771/5/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

http://gerrit.cloudera.org:8080/#/c/10771/5/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@176
PS5, Line 176:       // TODO: If this is the transposition phase, then we can 
push conjuncts that
Is this an important optimisation?


http://gerrit.cloudera.org:8080/#/c/10771/5/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@277
PS5, Line 277: MURMUR_HASH
Tianyi decided to use FastHash for exchanges over murmur because it has 
slightly better properties (I can find the JIRA if needed). I wonder if we 
should also be using FastHash here instead. It probably doesn't make a 
difference in practice I guess so unclear if it's worth the effort, but it 
seems like the better thing to do.


http://gerrit.cloudera.org:8080/#/c/10771/5/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@453
PS5, Line 453:       // TODO: Determine node-level spillable buffer size. Does 
it even make sense?
Probably not if we're making the decision at the aggregator level.


http://gerrit.cloudera.org:8080/#/c/10771/5/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@519
PS5, Line 519:     ResourceProfile result = new ResourceProfileBuilder()
nit: Unnecessary intermediate var.



--
To view, visit http://gerrit.cloudera.org:8080/10771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I055402eaef6d81e5f70e850d9f8a621e766830a4
Gerrit-Change-Number: 10771
Gerrit-PatchSet: 5
Gerrit-Owner: Thomas Marshall <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Thomas Marshall <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Vuk Ercegovac <[email protected]>
Gerrit-Comment-Date: Fri, 17 Aug 2018 20:31:12 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-110: Support for multiple DISTINCT

Reply via email to