Tianyi Wang has posted comments on this change. ( http://gerrit.cloudera.org:8080/9777 )
Change subject: IMPALA-4025: Part 1: Add percentile_disc aggregation function ...................................................................... Patch Set 2: (19 comments) http://gerrit.cloudera.org:8080/#/c/9777/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/9777/1//COMMIT_MSG@7 PS1, Line 7: Part 1 > What's in Part 2? Part2 adds percentile_cont and median: https://gerrit.cloudera.org/c/9778/. Part3 adds percentile functions to the query generator: https://gerrit.cloudera.org/c/9878/. http://gerrit.cloudera.org:8080/#/c/9777/1//COMMIT_MSG@17 PS1, Line 17: throws an error and impala returns NULL. : : Some FE and EE tests are added. EE tests not related to the above : difference are v > pls format this so its easier to read. for example, the ..._diff_0, * from The example is now in StmtRewriter.java. It's hard to fit it formatted in the commit message. http://gerrit.cloudera.org:8080/#/c/9777/1//COMMIT_MSG@24 PS1, Line 24: > Why is that? To do so we need a mechanism to throw an error in an expr, which we currently don't have. http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java File fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java: http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java@363 PS1, Line 363: requiresPercentileRewrite() > first thing that got my attention is the way the rewrite for this case is h - The rewriting of percentile is called in StmtRewriter.rewriteSelectStatement(). The pattern there is "if (hasFoo()) then rewriteBar()", so this percentile rewriting doesn't really stand out there. - In my understanding a rewrite rule is usually a stmt -> stmt function. If such functions are put in a list and called in a loop then it can be named as a rule-based approach. Impala doesn't have such pattern for stmt rewriting for now. http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/main/java/org/apache/impala/analysis/PercentileAggExpr.java File fe/src/main/java/org/apache/impala/analysis/PercentileAggExpr.java: http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/main/java/org/apache/impala/analysis/PercentileAggExpr.java@49 PS1, Line 49: f (!super.localEquals(that)) return false; > "Rewrite 'stmt' containing percentile functions in an equivalent statement Done. I added more detailed description in the comment of this function. http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/main/java/org/apache/impala/analysis/PercentileAggExpr.java@52 PS1, Line 52: : : @Override : public String toSqlImpl() { > I think a more elaborate example may help here. For instance: "select perce Done http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/main/java/org/apache/impala/analysis/PercentileAggExpr.java@56 PS1, Line 56: eturn "percentile_disc(" + percentileExpr().toSq > Is the assumption that 'stmt' is analyzed before calling this function? If It doesn't need to be analyzed. I added this in to the comment. http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/main/java/org/apache/impala/analysis/PercentileAggExpr.java@58 PS1, Line 58: } > Shouldn't that be in StmtRewriter.java? It's moved there now. http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/main/java/org/apache/impala/analysis/PercentileAggExpr.java@59 PS1, Line 59: > nit: just for consistency's sake, can you plz use Guava's newArrayList() st Done http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/main/java/org/apache/impala/analysis/PercentileAggExpr.java@59 PS1, Line 59: > Comment what is stored here. Done http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/main/java/org/apache/impala/analysis/PercentileAggExpr.java@63 PS1, Line 63: centileAggExpr should be rewritten without being passed to BE"); > Just by looking at this piece of code, it's not clear the extend to which e renamed to rewritePercentileExpr http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/main/java/org/apache/impala/analysis/PercentileAggExpr.java@84 PS1, Line 84: > nit: do you need that cast? Yes. There are two constructors: InlineViewRef(String alias, QueryStmt queryStmt, List<String> colLabels) and InlineViewRef(String alias, QueryStmt queryStmt, TableSampleClause sampleParams) The only difference between there signature is the type of the last one so it would be ambiguous to just call with null. http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/main/java/org/apache/impala/analysis/PercentileAggExpr.java@90 PS1, Line 90: : Expr orderByExpr() { return getChild(0); } : } > This is a bit too dense and not easy to follow. I think it may be better to Done http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java: http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@1001 PS1, Line 1001: do n > "do" Done http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java: http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@2329 PS1, Line 2329: > There are many more interesting test cases such as: a) percentiles in selec I added some EE tests covering these, because we may be interested in the results of these queries. Is there need to do AnalyzeOk(...) here for the queries that are already in the EE tests? http://gerrit.cloudera.org:8080/#/c/9777/1/testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test File testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test: http://gerrit.cloudera.org:8080/#/c/9777/1/testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test@1455 PS1, Line 1455: ==== > Percentile functions may show up in both having and order by clauses. Add a Done http://gerrit.cloudera.org:8080/#/c/9777/1/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test File testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test: http://gerrit.cloudera.org:8080/#/c/9777/1/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test@2352 PS1, Line 2352: ==== > Also add a case where a percentile shows up in the subquery (scalar and non Done http://gerrit.cloudera.org:8080/#/c/9777/1/testdata/workloads/functional-query/queries/QueryTest/aggregation.test File testdata/workloads/functional-query/queries/QueryTest/aggregation.test: http://gerrit.cloudera.org:8080/#/c/9777/1/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1377 PS1, Line 1377: select percentile_disc(-1) within group (order by tinyint_col), : percentile_disc(1.1) within group (order by tinyint_col) from alltypesagg; : ---- RESULTS : NULL,NULL : ---- TYPES : TINYINT,TINYINT : ==== : ---- QUERY : # The following percentile tests are verified against PostgreSQL. : # Test percentile=0, 1. Null should not appear at any end. : select per > Just write one query that covers these cases "percentile_disc(1.1)..., perc Done http://gerrit.cloudera.org:8080/#/c/9777/1/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1414 PS1, Line 1414: 5,'545' : 6,'556' : 7,'557' : 8,'558' : 9,'559' : NULL,'530' : ---- TYPES : TINYINT,STRING : ==== : ---- QUERY : # Test non-const percentile expr. : select tinyint_col, percentile_disc(case when tinyint_col >= 5 then 0.8 else 0.2 end) : within group (order by bigint_col) from alltypesagg : group by tinyint_col order by tinyint_col; : ---- RESULTS : 1,1910 : 2,1920 : 3,1930 : 4,1940 : 5,7950 : 6,7960 : 7,7970 : 8,7980 : 9,7990 : NULL,2000 : ---- TYPES : TINYINT > Same comment as before. I'd just replace these with one query and multiple Done -- To view, visit http://gerrit.cloudera.org:8080/9777 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iacef7b3fcd74c4c73d88400ce27307c3baa0121e Gerrit-Change-Number: 9777 Gerrit-PatchSet: 2 Gerrit-Owner: Tianyi Wang <[email protected]> Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]> Gerrit-Reviewer: Tianyi Wang <[email protected]> Gerrit-Reviewer: Vuk Ercegovac <[email protected]> Gerrit-Comment-Date: Fri, 30 Mar 2018 22:35:26 +0000 Gerrit-HasComments: Yes
