Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/10771 )
Change subject: IMPALA-110: Support for multiple DISTINCT ...................................................................... Patch Set 7: (25 comments) comments from initial pass. still looking at several backend classes, some of the tests, and multi-aggregate info. http://gerrit.cloudera.org:8080/#/c/10771/7/be/src/exec/aggregation-node-base.cc File be/src/exec/aggregation-node-base.cc: http://gerrit.cloudera.org:8080/#/c/10771/7/be/src/exec/aggregation-node-base.cc@85 PS7, Line 85: fast path to avoid going through all tuples optimization that bets that the index of non-null agg will be the same from one tuple to the next. http://gerrit.cloudera.org:8080/#/c/10771/6/be/src/exec/aggregator.h File be/src/exec/aggregator.h: http://gerrit.cloudera.org:8080/#/c/10771/6/be/src/exec/aggregator.h@64 PS6, Line 64: int agg_idx add a comment mentioning what this index is relative to http://gerrit.cloudera.org:8080/#/c/10771/6/be/src/exprs/valid-tuple-id.h File be/src/exprs/valid-tuple-id.h: http://gerrit.cloudera.org:8080/#/c/10771/6/be/src/exprs/valid-tuple-id.h@28 PS6, Line 28: /// Valid input rows must have exactly one non-NULL tuple. what happens if it doesn't? http://gerrit.cloudera.org:8080/#/c/10771/6/be/src/runtime/row-batch.h File be/src/runtime/row-batch.h: http://gerrit.cloudera.org:8080/#/c/10771/6/be/src/runtime/row-batch.h@247 PS6, Line 247: int RowNum() { return (row_ - parent_->tuple_ptrs_) / num_tuples_per_row_; } this can't be < 0 or produce a div zero error? http://gerrit.cloudera.org:8080/#/c/10771/6/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: http://gerrit.cloudera.org:8080/#/c/10771/6/common/thrift/PlanNodes.thrift@401 PS6, Line 401: replicate_input add a comment for this http://gerrit.cloudera.org:8080/#/c/10771/7/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: http://gerrit.cloudera.org:8080/#/c/10771/7/common/thrift/PlanNodes.thrift@397 PS7, Line 397: pls add comments for this one (in particular, replicate_input) http://gerrit.cloudera.org:8080/#/c/10771/6/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java File fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java: http://gerrit.cloudera.org:8080/#/c/10771/6/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java@38 PS6, Line 38: A mix of distinct and non-distinct aggregation functions is allowed as long as all : * distinct functions have the same distinct expressions. I did not understand this-- clarify with a counter-example. http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java File fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java: http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java@50 PS7, Line 50: * - Only a non-distinct class: add an example, such as: max(a) http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java@52 PS7, Line 52: * - One distinct class, and optionally a non-distinct class: example, count(distinct a)[, max(b)] http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java@56 PS7, Line 56: * - Multiple distinct classes, and optionally a non-distinct class example: count(distinct a), count(distinct b)[, max(c)] http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java@233 PS7, Line 233: DistinctAggExprs.isEmpty ideally, this reads without negations... I see three there. perhaps hasNonDistinctAggExprs? http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java@314 PS7, Line 314: * - Result tuple id 1 having 3 slots with ids 3, 4, 5 relate these to the columns/exprs in the example. http://gerrit.cloudera.org:8080/#/c/10771/6/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java: http://gerrit.cloudera.org:8080/#/c/10771/6/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@66 PS6, Line 66: MultiAggregateInfo does a single aggregate here boil down to a single AggregateInfo? http://gerrit.cloudera.org:8080/#/c/10771/6/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@387 PS6, Line 387: multiAggInfo_.materializeRequiredSlots(analyzer, baseTblSmap_); where is having dealt with now? http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java: http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@387 PS7, Line 387: multiAggInfo_.materializeRequiredSlots(analyzer, baseTblSmap_); any idea why it works out here to not worry about the having clause here? http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/planner/AggregationNode.java File fe/src/main/java/org/apache/impala/planner/AggregationNode.java: http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@268 PS7, Line 268: select nit: selects http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@286 PS7, Line 286: two with two classes in this example, also add where three case-when clauses came from. http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@348 PS7, Line 348: create(0); this is used as a sentinel, so must differ from any non-sentinel. is the expression used for this purpose (e.g., murmur_hash vs. numeric literal) or its evaluated value (e.g., eval(murmur_hash(arg)) vs 0)? if its the evaluated value, is there ambiguity? http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@480 PS7, Line 480: be nit: expand to backend (this just looked odd when quickly scanning). http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java: http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@a794 PS7, Line 794: any idea why this is removed? same question for methods below as well. http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java: http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java@2804 PS7, Line 2804: private void validateSingleColAppxCountDistinctRewrite( pls add comments for these. http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java: http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@2160 PS7, Line 2160: powerSet nice http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java: http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java@a61 PS7, Line 61: where'd these go? http://gerrit.cloudera.org:8080/#/c/10771/7/fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java@a138 PS7, Line 138: similarly, why is this dropped? http://gerrit.cloudera.org:8080/#/c/10771/7/testdata/workloads/functional-query/queries/QueryTest/spilling-aggs.test File testdata/workloads/functional-query/queries/QueryTest/spilling-aggs.test: http://gerrit.cloudera.org:8080/#/c/10771/7/testdata/workloads/functional-query/queries/QueryTest/spilling-aggs.test@192 PS7, Line 192: # Multiple distinct and non-distinct, with an intermediate tuple (avg) add a variant of this with a group by -- To view, visit http://gerrit.cloudera.org:8080/10771 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I055402eaef6d81e5f70e850d9f8a621e766830a4 Gerrit-Change-Number: 10771 Gerrit-PatchSet: 7 Gerrit-Owner: Thomas Marshall <thomasmarsh...@cmu.edu> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Thomas Marshall <thomasmarsh...@cmu.edu> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Vuk Ercegovac <vercego...@cloudera.com> Gerrit-Comment-Date: Fri, 24 Aug 2018 17:05:34 +0000 Gerrit-HasComments: Yes