[Impala-ASF-CR] IMPALA-4025: Part 1: Add percentile disc aggregation function
Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/9777 ) Change subject: IMPALA-4025: Part 1: Add percentile_disc aggregation function .. Patch Set 7: (6 comments) http://gerrit.cloudera.org:8080/#/c/9777/7/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java File fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java: http://gerrit.cloudera.org:8080/#/c/9777/7/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java@99 PS7, Line 99: outputGroupingExprSmap_ is this used (or planned to be used) anywhere? http://gerrit.cloudera.org:8080/#/c/9777/7/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/9777/7/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@187 PS7, Line 187: void setContainsPercentile() { public? http://gerrit.cloudera.org:8080/#/c/9777/7/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@217 PS7, Line 217: boolean containsPercentile = false; make public as well? http://gerrit.cloudera.org:8080/#/c/9777/7/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@354 PS7, Line 354: boolean selectBlockContainsPercentile() { return containsPercentile; } intentional visibility? http://gerrit.cloudera.org:8080/#/c/9777/7/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java: http://gerrit.cloudera.org:8080/#/c/9777/7/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@108 PS7, Line 108: rewriteSelectStatementHook(stmt, analyzer); Any way to separate this refactor into its own change? first example I saw: where did the helper 'hasSubqueryInDisunction' go? I'm guessing this doesn't have anything to specifically do with the change for percentile. Looks like a good cleanup overall, but since there is some subtle existing functionality here, it would be helpful to separate clean-up from the actual change. http://gerrit.cloudera.org:8080/#/c/9777/5/fe/src/main/java/org/apache/impala/catalog/AggregateFunction.java File fe/src/main/java/org/apache/impala/catalog/AggregateFunction.java: http://gerrit.cloudera.org:8080/#/c/9777/5/fe/src/main/java/org/apache/impala/catalog/AggregateFunction.java@253 PS5, Line 253: if (isLogical_) { change this to a precondition -- To view, visit http://gerrit.cloudera.org:8080/9777 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iacef7b3fcd74c4c73d88400ce27307c3baa0121e Gerrit-Change-Number: 9777 Gerrit-PatchSet: 7 Gerrit-Owner: Tianyi Wang Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Thu, 17 May 2018 21:11:26 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4025: Part 1: Add percentile disc aggregation function
Tianyi Wang has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/9777 ) Change subject: IMPALA-4025: Part 1: Add percentile_disc aggregation function .. IMPALA-4025: Part 1: Add percentile_disc aggregation function This patch adds aggregation function percentile_disc. The implementation is rewriting it into an inline view. The inline view computes the row number on the ordering expr using analytic functions. The parent query then picks the desired row using aggregation. An Example of such rewrite is in StmtRewriter.java. The behavior of this function is mostly the same as in PostgreSQL. The handling of percentile expr not in [0, 1] is different: PostgreSQL throws an error and impala returns NULL. Some FE and EE tests are added. EE tests not related to the above difference are verified against PostgreSQL. Change-Id: Iacef7b3fcd74c4c73d88400ce27307c3baa0121e --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java A fe/src/main/java/org/apache/impala/analysis/PercentileAggExpr.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java M fe/src/main/java/org/apache/impala/catalog/AggregateFunction.java M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-query/queries/QueryTest/aggregation.test 22 files changed, 2,204 insertions(+), 1,021 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/9777/7 -- To view, visit http://gerrit.cloudera.org:8080/9777 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iacef7b3fcd74c4c73d88400ce27307c3baa0121e Gerrit-Change-Number: 9777 Gerrit-PatchSet: 7 Gerrit-Owner: Tianyi Wang Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-4025: Part 1: Add percentile disc aggregation function
Tianyi Wang has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/9777 ) Change subject: IMPALA-4025: Part 1: Add percentile_disc aggregation function .. IMPALA-4025: Part 1: Add percentile_disc aggregation function This patch adds aggregation function percentile_disc. The implementation is rewriting it into an inline view. The inline view computes the row number on the ordering expr using analytic functions. The parent query then picks the desired row using aggregation. An Example of such rewrite is in StmtRewriter.java. The behavior of this function is mostly the same as in PostgreSQL. The handling of percentile expr not in [0, 1] is different: PostgreSQL throws an error and impala returns NULL. Some FE and EE tests are added. EE tests not related to the above difference are verified against PostgreSQL. Change-Id: Iacef7b3fcd74c4c73d88400ce27307c3baa0121e --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java A fe/src/main/java/org/apache/impala/analysis/PercentileAggExpr.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java M fe/src/main/java/org/apache/impala/catalog/AggregateFunction.java M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-query/queries/QueryTest/aggregation.test 23 files changed, 1,261 insertions(+), 105 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/9777/6 -- To view, visit http://gerrit.cloudera.org:8080/9777 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iacef7b3fcd74c4c73d88400ce27307c3baa0121e Gerrit-Change-Number: 9777 Gerrit-PatchSet: 6 Gerrit-Owner: Tianyi Wang Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-4025: Part 1: Add percentile disc aggregation function
Tianyi Wang has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/9777 ) Change subject: IMPALA-4025: Part 1: Add percentile_disc aggregation function .. IMPALA-4025: Part 1: Add percentile_disc aggregation function This patch adds aggregation function percentile_disc. The implementation is rewriting it into an inline view. The inline view computes the row number on the ordering expr using analytic functions. The parent query then picks the desired row using aggregation. An Example of such rewrite is in StmtRewriter.java. The behavior of this function is mostly the same as in PostgreSQL. The handling of percentile expr not in [0, 1] is different: PostgreSQL throws an error and impala returns NULL. Some FE and EE tests are added. EE tests not related to the above difference are verified against PostgreSQL. Change-Id: Iacef7b3fcd74c4c73d88400ce27307c3baa0121e --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java A fe/src/main/java/org/apache/impala/analysis/PercentileAggExpr.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java M fe/src/main/java/org/apache/impala/catalog/AggregateFunction.java M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-query/queries/QueryTest/aggregation.test 23 files changed, 1,263 insertions(+), 127 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/9777/5 -- To view, visit http://gerrit.cloudera.org:8080/9777 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iacef7b3fcd74c4c73d88400ce27307c3baa0121e Gerrit-Change-Number: 9777 Gerrit-PatchSet: 5 Gerrit-Owner: Tianyi Wang Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-4025: Part 1: Add percentile disc aggregation function
Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/9777 ) Change subject: IMPALA-4025: Part 1: Add percentile_disc aggregation function .. Patch Set 4: (5 comments) add a tosql test to exercise that code path. http://gerrit.cloudera.org:8080/#/c/9777/4/fe/src/main/java/org/apache/impala/catalog/AggregateFunction.java File fe/src/main/java/org/apache/impala/catalog/AggregateFunction.java: http://gerrit.cloudera.org:8080/#/c/9777/4/fe/src/main/java/org/apache/impala/catalog/AggregateFunction.java@160 PS4, Line 160: fn.setIsPersistent(true); why is this set to true? http://gerrit.cloudera.org:8080/#/c/9777/4/fe/src/main/java/org/apache/impala/catalog/AggregateFunction.java@247 PS4, Line 247: public TFunction toThrift() { logical fns should never escape the fe, so should we poison the cases where they could get out? http://gerrit.cloudera.org:8080/#/c/9777/4/testdata/workloads/functional-query/queries/QueryTest/aggregation.test File testdata/workloads/functional-query/queries/QueryTest/aggregation.test: http://gerrit.cloudera.org:8080/#/c/9777/4/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1372 PS4, Line 1372: remove the blank line http://gerrit.cloudera.org:8080/#/c/9777/4/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1553 PS4, Line 1553: # Percentile with subquery rewrite Add a case where a constant is on LHS, e.g., instead of int_col not in ... -> 10 not in ... That tests interaction between two rewrites in stmt rewriter. http://gerrit.cloudera.org:8080/#/c/9777/4/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1558 PS4, Line 1558: from alltypesagg); lets add tests for: - correlated subquery - outer join (percentile over resulting, nullable columns) - case where ordering column values are all null or contain null - percentile is specified as part of a stored view definition do we have a flag for controlling null first/last when sorting (I assume its only specified at the query/clause level)? -- To view, visit http://gerrit.cloudera.org:8080/9777 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iacef7b3fcd74c4c73d88400ce27307c3baa0121e Gerrit-Change-Number: 9777 Gerrit-PatchSet: 4 Gerrit-Owner: Tianyi Wang Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 20 Apr 2018 18:50:38 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4025: Part 1: Add percentile disc aggregation function
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/9777 ) Change subject: IMPALA-4025: Part 1: Add percentile_disc aggregation function .. Patch Set 4: (8 comments) http://gerrit.cloudera.org:8080/#/c/9777/4/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java File fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java: http://gerrit.cloudera.org:8080/#/c/9777/4/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java@99 PS4, Line 99: private ExprSubstitutionMap outputGroupingExprSmap_ = new ExprSubstitutionMap(); This seems like a subset of outputTupleSmap_. If it's just for checking whether an expr is bounded by grouping exprs, why can't we use outputTupleSmap_? http://gerrit.cloudera.org:8080/#/c/9777/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/9777/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@353 PS4, Line 353: boolean topStmtContainsPercentile() { return containsPercentile; } I haven't seen yet where this is used but I feel this is not correct. A stmt may have a hierarchy of analyzers each one corresponding to a select block and each one with or without a percentile function. So "topStmtContainsPercentile" in the context of single analyzer seems kind of off. http://gerrit.cloudera.org:8080/#/c/9777/4/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java: http://gerrit.cloudera.org:8080/#/c/9777/4/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java@468 PS4, Line 468: mergeAggInputFn_.analyze(analyzer); Are you sure this is correct? Won't this call Expr.analyzer() which then subsequently calls FunctionCallExpr.analyzeImpl()? http://gerrit.cloudera.org:8080/#/c/9777/4/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java: http://gerrit.cloudera.org:8080/#/c/9777/4/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@63 PS4, Line 63: // slotRefs directly refer to table(s) in the from clause. Can you plz expand the comment a bit to describe why this is needed? http://gerrit.cloudera.org:8080/#/c/9777/4/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java: http://gerrit.cloudera.org:8080/#/c/9777/4/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@1140 PS4, Line 1140: Those slotRefs Unfinished sentence http://gerrit.cloudera.org:8080/#/c/9777/4/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@1143 PS4, Line 1143: slotRef mention what this slotRef represents http://gerrit.cloudera.org:8080/#/c/9777/4/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@1163 PS4, Line 1163: + nit: add a space before '+' http://gerrit.cloudera.org:8080/#/c/9777/4/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java: http://gerrit.cloudera.org:8080/#/c/9777/4/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@2124 PS4, Line 2124: "from functional.alltypesagg group by tinyint_col"); Can you also add a case with a WITH clause? -- To view, visit http://gerrit.cloudera.org:8080/9777 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iacef7b3fcd74c4c73d88400ce27307c3baa0121e Gerrit-Change-Number: 9777 Gerrit-PatchSet: 4 Gerrit-Owner: Tianyi Wang Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Thu, 19 Apr 2018 21:07:05 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4025: Part 1: Add percentile disc aggregation function
Tianyi Wang has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/9777 ) Change subject: IMPALA-4025: Part 1: Add percentile_disc aggregation function .. IMPALA-4025: Part 1: Add percentile_disc aggregation function This patch adds aggregation function percentile_disc. The implementation is rewriting it into a subquery. The subquery computes the row number on the ordering expr using analytic functions. The parent query then picks the desired row using aggregation. An Example of such rewrite is in StmtRewriter.java. The behavior of this function is mostly the same as in PostgreSQL. The handling of percentile expr not in [0, 1] is different: PostgreSQL throws an error and impala returns NULL. Some FE and EE tests are added. EE tests not related to the above difference are verified against PostgreSQL. Change-Id: Iacef7b3fcd74c4c73d88400ce27307c3baa0121e --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java A fe/src/main/java/org/apache/impala/analysis/PercentileAggExpr.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java M fe/src/main/java/org/apache/impala/catalog/AggregateFunction.java M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-query/queries/QueryTest/aggregation.test 20 files changed, 1,092 insertions(+), 99 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/9777/4 -- To view, visit http://gerrit.cloudera.org:8080/9777 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iacef7b3fcd74c4c73d88400ce27307c3baa0121e Gerrit-Change-Number: 9777 Gerrit-PatchSet: 4 Gerrit-Owner: Tianyi Wang Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-4025: Part 1: Add percentile disc aggregation function
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/9777 ) Change subject: IMPALA-4025: Part 1: Add percentile_disc aggregation function .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/9777/3/fe/src/main/java/org/apache/impala/analysis/SlotRef.java File fe/src/main/java/org/apache/impala/analysis/SlotRef.java: http://gerrit.cloudera.org:8080/#/c/9777/3/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@149 PS3, Line 149: void setRawPath(List rawPath) { > It's not needed. But the existing expr substitution utilities have several Modifying the SlotRef in place sort of violates the way we generally handle cases like this (e.g. when an expr needs to be replaced by another expr). I am in favor of refactoring the substitution logic to make it more generic and flexible but I'll let Alex chime on it since I haven't thought of all the details. -- To view, visit http://gerrit.cloudera.org:8080/9777 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iacef7b3fcd74c4c73d88400ce27307c3baa0121e Gerrit-Change-Number: 9777 Gerrit-PatchSet: 3 Gerrit-Owner: Tianyi Wang Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Thu, 12 Apr 2018 18:14:35 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4025: Part 1: Add percentile disc aggregation function
Tianyi Wang has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/9777 ) Change subject: IMPALA-4025: Part 1: Add percentile_disc aggregation function .. IMPALA-4025: Part 1: Add percentile_disc aggregation function This patch adds aggregation function percentile_disc. The implementation is rewriting it into a subquery. The subquery computes the row number on the ordering expr using analytic functions. The parent query then picks the desired row using aggregation. An Example of such rewrite is in StmtRewriter.java. The behavior of this function is mostly the same as in PostgreSQL. The handling of percentile expr not in [0, 1] is different: PostgreSQL throws an error and impala returns NULL. Some FE and EE tests are added. EE tests not related to the above difference are verified against PostgreSQL. Change-Id: Iacef7b3fcd74c4c73d88400ce27307c3baa0121e --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java A fe/src/main/java/org/apache/impala/analysis/PercentileAggExpr.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/SlotRef.java M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java M fe/src/main/java/org/apache/impala/catalog/AggregateFunction.java M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-query/queries/QueryTest/aggregation.test 17 files changed, 863 insertions(+), 69 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/9777/3 -- To view, visit http://gerrit.cloudera.org:8080/9777 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iacef7b3fcd74c4c73d88400ce27307c3baa0121e Gerrit-Change-Number: 9777 Gerrit-PatchSet: 3 Gerrit-Owner: Tianyi Wang Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-4025: Part 1: Add percentile disc aggregation function
Tianyi Wang has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/9777 ) Change subject: IMPALA-4025: Part 1: Add percentile_disc aggregation function .. IMPALA-4025: Part 1: Add percentile_disc aggregation function This patch adds aggregation function percentile_disc. The implementation is rewriting it into a subquery. The subquery computes the row number on the ordering expr using analytic functions. The parent query then picks the desired row using aggregation. An Example of such rewrite is in StmtRewriter.java. The behavior of this function is mostly the same as in PostgreSQL. The handling of percentile expr not in [0, 1] is different: PostgreSQL throws an error and impala returns NULL. Some FE and EE tests are added. EE tests not related to the above difference are verified against PostgreSQL. Change-Id: Iacef7b3fcd74c4c73d88400ce27307c3baa0121e --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java A fe/src/main/java/org/apache/impala/analysis/PercentileAggExpr.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/SlotRef.java M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java M fe/src/main/java/org/apache/impala/catalog/AggregateFunction.java M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-query/queries/QueryTest/aggregation.test 17 files changed, 863 insertions(+), 69 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/9777/2 -- To view, visit http://gerrit.cloudera.org:8080/9777 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iacef7b3fcd74c4c73d88400ce27307c3baa0121e Gerrit-Change-Number: 9777 Gerrit-PatchSet: 2 Gerrit-Owner: Tianyi Wang Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-4025: Part 1: Add percentile disc aggregation function
Tianyi Wang has posted comments on this change. ( http://gerrit.cloudera.org:8080/9777 ) Change subject: IMPALA-4025: Part 1: Add percentile_disc aggregation function .. Patch Set 2: (19 comments) http://gerrit.cloudera.org:8080/#/c/9777/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/9777/1//COMMIT_MSG@7 PS1, Line 7: Part 1 > What's in Part 2? Part2 adds percentile_cont and median: https://gerrit.cloudera.org/c/9778/. Part3 adds percentile functions to the query generator: https://gerrit.cloudera.org/c/9878/. http://gerrit.cloudera.org:8080/#/c/9777/1//COMMIT_MSG@17 PS1, Line 17: throws an error and impala returns NULL. : : Some FE and EE tests are added. EE tests not related to the above : difference are v > pls format this so its easier to read. for example, the ..._diff_0, * from The example is now in StmtRewriter.java. It's hard to fit it formatted in the commit message. http://gerrit.cloudera.org:8080/#/c/9777/1//COMMIT_MSG@24 PS1, Line 24: > Why is that? To do so we need a mechanism to throw an error in an expr, which we currently don't have. http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java File fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java: http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java@363 PS1, Line 363: requiresPercentileRewrite() > first thing that got my attention is the way the rewrite for this case is h - The rewriting of percentile is called in StmtRewriter.rewriteSelectStatement(). The pattern there is "if (hasFoo()) then rewriteBar()", so this percentile rewriting doesn't really stand out there. - In my understanding a rewrite rule is usually a stmt -> stmt function. If such functions are put in a list and called in a loop then it can be named as a rule-based approach. Impala doesn't have such pattern for stmt rewriting for now. http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/main/java/org/apache/impala/analysis/PercentileAggExpr.java File fe/src/main/java/org/apache/impala/analysis/PercentileAggExpr.java: http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/main/java/org/apache/impala/analysis/PercentileAggExpr.java@49 PS1, Line 49: f (!super.localEquals(that)) return false; > "Rewrite 'stmt' containing percentile functions in an equivalent statement Done. I added more detailed description in the comment of this function. http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/main/java/org/apache/impala/analysis/PercentileAggExpr.java@52 PS1, Line 52: : : @Override : public String toSqlImpl() { > I think a more elaborate example may help here. For instance: "select perce Done http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/main/java/org/apache/impala/analysis/PercentileAggExpr.java@56 PS1, Line 56: eturn "percentile_disc(" + percentileExpr().toSq > Is the assumption that 'stmt' is analyzed before calling this function? If It doesn't need to be analyzed. I added this in to the comment. http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/main/java/org/apache/impala/analysis/PercentileAggExpr.java@58 PS1, Line 58: } > Shouldn't that be in StmtRewriter.java? It's moved there now. http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/main/java/org/apache/impala/analysis/PercentileAggExpr.java@59 PS1, Line 59: > nit: just for consistency's sake, can you plz use Guava's newArrayList() st Done http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/main/java/org/apache/impala/analysis/PercentileAggExpr.java@59 PS1, Line 59: > Comment what is stored here. Done http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/main/java/org/apache/impala/analysis/PercentileAggExpr.java@63 PS1, Line 63: centileAggExpr should be rewritten without being passed to BE"); > Just by looking at this piece of code, it's not clear the extend to which e renamed to rewritePercentileExpr http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/main/java/org/apache/impala/analysis/PercentileAggExpr.java@84 PS1, Line 84: > nit: do you need that cast? Yes. There are two constructors: InlineViewRef(String alias, QueryStmt queryStmt, List colLabels) and InlineViewRef(String alias, QueryStmt queryStmt, TableSampleClause sampleParams) The only difference between there signature is the type of the last one so it would be ambiguous to just call with null. http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/main/java/org/apache/impala/analysis/PercentileAggExpr.java@90 PS1, Line 90: : Expr orderByExpr() { return getChild(0); } : } > This is a bit too dense and not easy to follow. I think it may be better to Done http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java: http://gerrit.clo
[Impala-ASF-CR] IMPALA-4025: Part 1: Add percentile disc aggregation function
Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/9777 ) Change subject: IMPALA-4025: Part 1: Add percentile_disc aggregation function .. Patch Set 1: (2 comments) initial comment... still reading. http://gerrit.cloudera.org:8080/#/c/9777/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/9777/1//COMMIT_MSG@17 PS1, Line 17: select min(case when _percentile_row_number_diff_0 >= 0 then : _percentile_view.col end) from (select row_number() over (order by col) : - 0.5 * count(col) over() _percentile_row_number_diff_0, * from tbl) : _percentile_view pls format this so its easier to read. for example, the ..._diff_0, * from tbl) _percentile_view is difficult to follow/looks like an error. http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java File fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java: http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java@363 PS1, Line 363: requiresPercentileRewrite() first thing that got my attention is the way the rewrite for this case is handled. it seems fairly special: percentile is one specific type of aggregate function yet it has its unique handling here. I don't have a better suggestion right now-- looking into this more-- but can you comment on the other ways you might have tried this and why they didn't work (e.g., more generic rewrite rule approach) -- To view, visit http://gerrit.cloudera.org:8080/9777 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iacef7b3fcd74c4c73d88400ce27307c3baa0121e Gerrit-Change-Number: 9777 Gerrit-PatchSet: 1 Gerrit-Owner: Tianyi Wang Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Wed, 28 Mar 2018 16:18:28 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4025: Part 1: Add percentile disc aggregation function
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/9777 ) Change subject: IMPALA-4025: Part 1: Add percentile_disc aggregation function .. Patch Set 1: (17 comments) http://gerrit.cloudera.org:8080/#/c/9777/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/9777/1//COMMIT_MSG@7 PS1, Line 7: Part 1 What's in Part 2? http://gerrit.cloudera.org:8080/#/c/9777/1//COMMIT_MSG@24 PS1, Line 24: impala returns NULL Why is that? http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/main/java/org/apache/impala/analysis/PercentileAggExpr.java File fe/src/main/java/org/apache/impala/analysis/PercentileAggExpr.java: http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/main/java/org/apache/impala/analysis/PercentileAggExpr.java@49 PS1, Line 49: Rewrite 'stmt' into a child query and a parent query "Rewrite 'stmt' containing percentile functions in an equivalent statement using subqueries. Percentile functions may exist in: a) the select list, b) the HAVING clause, and c) the ORDER BY clause.". Also, do we assume that 'stmt' contains at least one percentile fn or it doesn't matter? What happens if there is no percentile fn? Maybe mention that as well. http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/main/java/org/apache/impala/analysis/PercentileAggExpr.java@52 PS1, Line 52: "select percentile_disc(0.5) within group (order by col) from tbl" :* would be rewritten to "select min(case when _percentile_row_number_diff_0 >= 0 then :* _percentile_view.col end) from (select row_number() over (order by col) - :* 0.5 * count(col) over() _percentile_row_number_diff_0 from tbl) _percentile_view". I think a more elaborate example may help here. For instance: "select percentile_disc(0.5) within group (order by col), b from tbl where c < 10 group by b having percentile_disc(0.1) within group (order by col) > 10;" That example would highlight a few more things such as as the generation of multiple child query select items, how the WHERE clause of the original stmt is transferred to the subquery, and how the having clause is rewritten as well. http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/main/java/org/apache/impala/analysis/PercentileAggExpr.java@56 PS1, Line 56: 'stmt' needs to be reanalyzed after the rewrite. Is the assumption that 'stmt' is analyzed before calling this function? If so, mention that in the comment and add a precondition check at the beginning of the function. http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/main/java/org/apache/impala/analysis/PercentileAggExpr.java@58 PS1, Line 58: static void rewriteSelectStmt(SelectStmt stmt) { Shouldn't that be in StmtRewriter.java? http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/main/java/org/apache/impala/analysis/PercentileAggExpr.java@59 PS1, Line 59: new ArrayList<>(); nit: just for consistency's sake, can you plz use Guava's newArrayList() static function? http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/main/java/org/apache/impala/analysis/PercentileAggExpr.java@59 PS1, Line 59: ArrayList childSelectListItems = new ArrayList<>(); Comment what is stored here. http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/main/java/org/apache/impala/analysis/PercentileAggExpr.java@63 PS1, Line 63: reWriteExpr(item.getExpr(), stmt.groupingExprs_, childSelectListItems)); Just by looking at this piece of code, it's not clear the extend to which exprs are rewritten, i.e. do you handle only the percentile case or other expr rewrites as well? Maybe rename the function to something more explicit like rewritePercentileExpr or something along these lines. http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/main/java/org/apache/impala/analysis/PercentileAggExpr.java@84 PS1, Line 84: (TableSampleClause) nit: do you need that cast? http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/main/java/org/apache/impala/analysis/PercentileAggExpr.java@90 PS1, Line 90: The generated child query select :* list items will be added to 'childSelectListItems' and the rewritten expr will be :* returned. This is a bit too dense and not easy to follow. I think it may be better to describe this as follows. From each percentile expr 'e', two new exprs are generated (based on the transformation described in rewriteSelectStmt() function), one that replaces 'e' in the select list of the original statement and one that is added to the select list of the generated subquery. http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java: http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@1001 PS1, Line 1001: does "do" http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.ja
[Impala-ASF-CR] IMPALA-4025: Part 1: Add percentile disc aggregation function
Tianyi Wang has uploaded this change for review. ( http://gerrit.cloudera.org:8080/9777 Change subject: IMPALA-4025: Part 1: Add percentile_disc aggregation function .. IMPALA-4025: Part 1: Add percentile_disc aggregation function This patch adds aggregation function percentile_disc. The implementation is rewriting it into a subquery. The subquery computes the row number on the ordering expr using analytic functions. The parent query then picks the desired row using aggregation. An Example of such rewrite: select percentile_disc(0.5) within group (order by col) from tbl would be rewritten into select min(case when _percentile_row_number_diff_0 >= 0 then _percentile_view.col end) from (select row_number() over (order by col) - 0.5 * count(col) over() _percentile_row_number_diff_0, * from tbl) _percentile_view The behavior of this function is mostly the same as in PostgreSQL. The handling of percentile expr not in [0, 1] is different: PostgreSQL throws an error and impala returns NULL. Some FE and EE tests are added. EE tests not related to the above difference are verified against PostgreSQL. Change-Id: Iacef7b3fcd74c4c73d88400ce27307c3baa0121e --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java A fe/src/main/java/org/apache/impala/analysis/PercentileAggExpr.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java M fe/src/main/java/org/apache/impala/catalog/AggregateFunction.java M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-query/queries/QueryTest/aggregation.test 16 files changed, 659 insertions(+), 67 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/9777/1 -- To view, visit http://gerrit.cloudera.org:8080/9777 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Iacef7b3fcd74c4c73d88400ce27307c3baa0121e Gerrit-Change-Number: 9777 Gerrit-PatchSet: 1 Gerrit-Owner: Tianyi Wang