Tianyi Wang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9777 )

Change subject: IMPALA-4025: Part 1: Add percentile_disc aggregation function
......................................................................


Patch Set 2:

(19 comments)

http://gerrit.cloudera.org:8080/#/c/9777/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9777/1//COMMIT_MSG@7
PS1, Line 7: Part 1
> What's in Part 2?
Part2 adds percentile_cont and median: https://gerrit.cloudera.org/c/9778/. 
Part3 adds percentile functions to the query generator: 
https://gerrit.cloudera.org/c/9878/.


http://gerrit.cloudera.org:8080/#/c/9777/1//COMMIT_MSG@17
PS1, Line 17: throws an error and impala returns NULL.
            :
            : Some FE and EE tests are added. EE tests not related to the above
            : difference are v
> pls format this so its easier to read. for example, the ..._diff_0, * from
The example is now in StmtRewriter.java. It's hard to fit it formatted in the 
commit message.


http://gerrit.cloudera.org:8080/#/c/9777/1//COMMIT_MSG@24
PS1, Line 24:
> Why is that?
To do so we need a mechanism to throw an error in an expr, which we currently 
don't have.


http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
File fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java:

http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java@363
PS1, Line 363: requiresPercentileRewrite()
> first thing that got my attention is the way the rewrite for this case is h
-  The rewriting of percentile is called in 
StmtRewriter.rewriteSelectStatement(). The pattern there is "if (hasFoo()) then 
rewriteBar()", so this percentile rewriting doesn't really stand out there.
- In my understanding a rewrite rule is usually a stmt -> stmt function. If 
such functions are put in a list and called in a loop then it can be named as a 
rule-based approach. Impala doesn't have such pattern for stmt rewriting for 
now.


http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/main/java/org/apache/impala/analysis/PercentileAggExpr.java
File fe/src/main/java/org/apache/impala/analysis/PercentileAggExpr.java:

http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/main/java/org/apache/impala/analysis/PercentileAggExpr.java@49
PS1, Line 49: f (!super.localEquals(that)) return false;
> "Rewrite 'stmt' containing percentile functions in an equivalent statement
Done. I added more detailed description in the comment of this function.


http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/main/java/org/apache/impala/analysis/PercentileAggExpr.java@52
PS1, Line 52:
            :
            :   @Override
            :   public String toSqlImpl() {
> I think a more elaborate example may help here. For instance: "select perce
Done


http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/main/java/org/apache/impala/analysis/PercentileAggExpr.java@56
PS1, Line 56: eturn "percentile_disc(" + percentileExpr().toSq
> Is the assumption that 'stmt' is analyzed before calling this function? If
It doesn't need to be analyzed. I added this in to the comment.


http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/main/java/org/apache/impala/analysis/PercentileAggExpr.java@58
PS1, Line 58: }
> Shouldn't that be in StmtRewriter.java?
It's moved there now.


http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/main/java/org/apache/impala/analysis/PercentileAggExpr.java@59
PS1, Line 59:
> nit: just for consistency's sake, can you plz use Guava's newArrayList() st
Done


http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/main/java/org/apache/impala/analysis/PercentileAggExpr.java@59
PS1, Line 59:
> Comment what is stored here.
Done


http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/main/java/org/apache/impala/analysis/PercentileAggExpr.java@63
PS1, Line 63: centileAggExpr should be rewritten without being passed to BE");
> Just by looking at this piece of code, it's not clear the extend to which e
renamed to rewritePercentileExpr


http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/main/java/org/apache/impala/analysis/PercentileAggExpr.java@84
PS1, Line 84:
> nit: do you need that cast?
Yes. There are two constructors: InlineViewRef(String alias, QueryStmt 
queryStmt, List<String> colLabels) and InlineViewRef(String alias, QueryStmt 
queryStmt, TableSampleClause sampleParams)
The only difference between there signature is the type of the last one so it 
would be ambiguous to just call with null.


http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/main/java/org/apache/impala/analysis/PercentileAggExpr.java@90
PS1, Line 90:
            :   Expr orderByExpr() { return getChild(0); }
            : }
> This is a bit too dense and not easy to follow. I think it may be better to
Done


http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java:

http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@1001
PS1, Line 1001: do n
> "do"
Done


http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java:

http://gerrit.cloudera.org:8080/#/c/9777/1/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@2329
PS1, Line 2329:
> There are many more interesting test cases such as: a) percentiles in selec
I added some EE tests covering these, because we may be interested in the 
results of these queries. Is there need to do AnalyzeOk(...) here for the 
queries that are already in the EE tests?


http://gerrit.cloudera.org:8080/#/c/9777/1/testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
File testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test:

http://gerrit.cloudera.org:8080/#/c/9777/1/testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test@1455
PS1, Line 1455: ====
> Percentile functions may show up in both having and order by clauses. Add a
Done


http://gerrit.cloudera.org:8080/#/c/9777/1/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test:

http://gerrit.cloudera.org:8080/#/c/9777/1/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test@2352
PS1, Line 2352: ====
> Also add a case where a percentile shows up in the subquery (scalar and non
Done


http://gerrit.cloudera.org:8080/#/c/9777/1/testdata/workloads/functional-query/queries/QueryTest/aggregation.test
File testdata/workloads/functional-query/queries/QueryTest/aggregation.test:

http://gerrit.cloudera.org:8080/#/c/9777/1/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1377
PS1, Line 1377: select percentile_disc(-1) within group (order by tinyint_col),
              :   percentile_disc(1.1) within group (order by tinyint_col) from 
alltypesagg;
              : ---- RESULTS
              : NULL,NULL
              : ---- TYPES
              : TINYINT,TINYINT
              : ====
              : ---- QUERY
              : # The following percentile tests are verified against 
PostgreSQL.
              : # Test percentile=0, 1. Null should not appear at any end.
              : select per
> Just write one query that covers these cases "percentile_disc(1.1)..., perc
Done


http://gerrit.cloudera.org:8080/#/c/9777/1/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1414
PS1, Line 1414: 5,'545'
              : 6,'556'
              : 7,'557'
              : 8,'558'
              : 9,'559'
              : NULL,'530'
              : ---- TYPES
              : TINYINT,STRING
              : ====
              : ---- QUERY
              : # Test non-const percentile expr.
              : select tinyint_col, percentile_disc(case when tinyint_col >= 5 
then 0.8 else 0.2 end)
              :   within group (order by bigint_col) from alltypesagg
              :   group by tinyint_col order by tinyint_col;
              : ---- RESULTS
              : 1,1910
              : 2,1920
              : 3,1930
              : 4,1940
              : 5,7950
              : 6,7960
              : 7,7970
              : 8,7980
              : 9,7990
              : NULL,2000
              : ---- TYPES
              : TINYINT
> Same comment as before. I'd just replace these with one query and multiple
Done



--
To view, visit http://gerrit.cloudera.org:8080/9777
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iacef7b3fcd74c4c73d88400ce27307c3baa0121e
Gerrit-Change-Number: 9777
Gerrit-PatchSet: 2
Gerrit-Owner: Tianyi Wang <[email protected]>
Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]>
Gerrit-Reviewer: Tianyi Wang <[email protected]>
Gerrit-Reviewer: Vuk Ercegovac <[email protected]>
Gerrit-Comment-Date: Fri, 30 Mar 2018 22:35:26 +0000
Gerrit-HasComments: Yes

Reply via email to