Thomas Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/10771 )
Change subject: IMPALA-110: Support for multiple DISTINCT ...................................................................... Patch Set 7: (25 comments) http://gerrit.cloudera.org:8080/#/c/10771/7/be/src/exec/aggregation-node-base.cc File be/src/exec/aggregation-node-base.cc: http://gerrit.cloudera.org:8080/#/c/10771/7/be/src/exec/aggregation-node-base.cc@85 PS7, Line 85: fast path to avoid going through all tuples > optimization that bets that the index of non-null agg will be the same from Done http://gerrit.cloudera.org:8080/#/c/10771/6/be/src/exec/aggregator.h File be/src/exec/aggregator.h: http://gerrit.cloudera.org:8080/#/c/10771/6/be/src/exec/aggregator.h@64 PS6, Line 64: int agg_idx > add a comment mentioning what this index is relative to Done http://gerrit.cloudera.org:8080/#/c/10771/6/be/src/exprs/valid-tuple-id.h File be/src/exprs/valid-tuple-id.h: http://gerrit.cloudera.org:8080/#/c/10771/6/be/src/exprs/valid-tuple-id.h@28 PS6, Line 28: /// Valid input rows must have exactly one non-NULL tuple. > what happens if it doesn't? In debug builds, we'll hit a dcheck. In release builds, we'll return the first non-null tuple. http://gerrit.cloudera.org:8080/#/c/10771/6/be/src/runtime/row-batch.h File be/src/runtime/row-batch.h: http://gerrit.cloudera.org:8080/#/c/10771/6/be/src/runtime/row-batch.h@247 PS6, Line 247: int RowNum() { return (row_ - parent_->tuple_ptrs_) / num_tuples_per_row_; } > this can't be < 0 or produce a div zero error? No, num_tuples_per_row_ cannot be 0, enforced by dchecks in the RowBatch constructor. http://gerrit.cloudera.org:8080/#/c/10771/6/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: http://gerrit.cloudera.org:8080/#/c/10771/6/common/thrift/PlanNodes.thrift@401 PS6, Line 401: replicate_input > add a comment for this Done http://gerrit.cloudera.org:8080/#/c/10771/7/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: http://gerrit.cloudera.org:8080/#/c/10771/7/common/thrift/PlanNodes.thrift@397 PS7, Line 397: > pls add comments for this one (in particular, replicate_input) Done http://gerrit.cloudera.org:8080/#/c/10771/6/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java File fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java: http://gerrit.cloudera.org:8080/#/c/10771/6/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java@38 PS6, Line 38: A mix of distinct and non-distinct aggregation functions is allowed as long as all : * distinct functions have the same distinct expressions. > I did not understand this-- clarify with a counter-example. Done http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java File fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java: http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java@50 PS7, Line 50: * - Only a non-distinct class: > add an example, such as: max(a) Done http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java@52 PS7, Line 52: * - One distinct class, and optionally a non-distinct class: > example, count(distinct a)[, max(b)] Done http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java@56 PS7, Line 56: * - Multiple distinct classes, and optionally a non-distinct class > example: count(distinct a), count(distinct b)[, max(c)] Done http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java@233 PS7, Line 233: DistinctAggExprs.isEmpty > ideally, this reads without negations... I see three there. perhaps hasNonD Done http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java@314 PS7, Line 314: * - Result tuple id 1 having 3 slots with ids 3, 4, 5 > relate these to the columns/exprs in the example. Done http://gerrit.cloudera.org:8080/#/c/10771/6/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java: http://gerrit.cloudera.org:8080/#/c/10771/6/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@66 PS6, Line 66: MultiAggregateInfo > does a single aggregate here boil down to a single AggregateInfo? Yes, if there is at most a single DISTINCT class, this MultiAggregateInfo will only contain a single AggregateInfo. http://gerrit.cloudera.org:8080/#/c/10771/6/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@387 PS6, Line 387: multiAggInfo_.materializeRequiredSlots(analyzer, baseTblSmap_); > where is having dealt with now? So I'm not entirely sure why having was special cased here before and no long needs to be (I think the special case here is just redundant and we would previously end up with the having pred present more than once in the list of exprs to materialize), but I've confirmed that having predicates are handled correctly: We call Analyzer.registerConjunct on the having pred during analyzeAggregation(). Then, it will be returned by analyzer.getUnassignedConjuncts in multiAggInfo_.collectConjuncts http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java: http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@387 PS7, Line 387: multiAggInfo_.materializeRequiredSlots(analyzer, baseTblSmap_); > any idea why it works out here to not worry about the having clause here? Answered on the other revision. http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/planner/AggregationNode.java File fe/src/main/java/org/apache/impala/planner/AggregationNode.java: http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@268 PS7, Line 268: select > nit: selects Done http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@286 PS7, Line 286: two > with two classes in this example, also add where three case-when clauses ca Done http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@348 PS7, Line 348: create(0); > this is used as a sentinel, so must differ from any non-sentinel. is the ex This is just used in the hashing expression, so its value doesn't matter. eg, in the example above, rows for the second agg class (distinct c) will be hashed according to 'd,c,0' so the same set of rows is hashed together in the exchange regardless of the value. Perhaps the term 'sentinel' is confusing here, since it sort of means the value is used to indicate something, which isn't the case? I'll change it to 'dummy' value. http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@480 PS7, Line 480: be > nit: expand to backend (this just looked odd when quickly scanning). Done http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java: http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@a794 PS7, Line 794: > any idea why this is removed? same question for methods below as well. Just a mistake. I put it back. http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java: http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java@2804 PS7, Line 2804: private void validateSingleColAppxCountDistinctRewrite( > pls add comments for these. Done http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java: http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@2160 PS7, Line 2160: powerSet > nice Done http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java: http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java@a61 PS7, Line 61: > where'd these go? This wasn't being used anywhere, and it wouldn't provide useful coverage anyways - the checks that are done here are already enforced by Preconditions in SelectStmt/BinaryPredicate.toThrift http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java@a138 PS7, Line 138: > similarly, why is this dropped? same as above http://gerrit.cloudera.org:8080/#/c/10771/7/testdata/workloads/functional-query/queries/QueryTest/spilling-aggs.test File testdata/workloads/functional-query/queries/QueryTest/spilling-aggs.test: http://gerrit.cloudera.org:8080/#/c/10771/7/testdata/workloads/functional-query/queries/QueryTest/spilling-aggs.test@192 PS7, Line 192: # Multiple distinct and non-distinct, with an intermediate tuple (avg) > add a variant of this with a group by Done -- To view, visit http://gerrit.cloudera.org:8080/10771 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I055402eaef6d81e5f70e850d9f8a621e766830a4 Gerrit-Change-Number: 10771 Gerrit-PatchSet: 7 Gerrit-Owner: Thomas Marshall <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Thomas Marshall <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Vuk Ercegovac <[email protected]> Gerrit-Comment-Date: Mon, 27 Aug 2018 23:12:52 +0000 Gerrit-HasComments: Yes
