Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15997 )

Change subject: [WIP] IMPALA-2658: Extend the NDV function to accept a precision
......................................................................


Patch Set 13:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/15997/7//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15997/7//COMMIT_MSG@7
PS7, Line 7: [WIP] IMPALA-2658: Extend the NDV function to accept a precision
> overall, I think this commit message is a bit verbose in terms of describin
ping - I think this still needs to be addressed


http://gerrit.cloudera.org:8080/#/c/15997/13//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15997/13//COMMIT_MSG@41
PS13, Line 41: Testing:
since you ran all core tests you should add an entry here that says "Ran core 
tests"


http://gerrit.cloudera.org:8080/#/c/15997/13//COMMIT_MSG@44
PS13, Line 44: 2 Run unit tests against other tables such as tpcds.store_sales
which unit tests are you talking about? if these are already include in "core" 
tests, you don't need to include this.


http://gerrit.cloudera.org:8080/#/c/15997/13//COMMIT_MSG@47
PS13, Line 47: select ndv(c_name, 1) "one", ndv(c_name, 2) two, ndv(c_name, 3) 
three,
             : ndv(c_name, 4) as four, ndv(c_name, 5) as five, ndv(c_name, 6) 
as six,
             : ndv(c_name, 7) as seven, ndv(c_name, 8) as eight, ndv(c_name, 9) 
as nine,
             : ndv(c_name, 10)  as ten
             : from tpch.customer;
             :
             : select ndv(ss_sold_time_sk, 1) "one", ndv(ss_sold_time_sk, 2) 
two,
             : ndv(ss_sold_time_sk, 3) three, ndv(ss_sold_time_sk, 4) as four,
             : ndv(ss_sold_time_sk, 5) as five, ndv(ss_sold_time_sk, 6) as six,
             : ndv(ss_sold_time_sk, 7) as seven, ndv(ss_sold_time_sk, 8) as 
eight,
             : ndv(ss_sold_time_sk, 9) as nine, ndv(ss_sold_time_sk, 10) as ten
             : from tpcds.store_sales;
I think adding these in the commit message makes it too verbose. You just need 
to mention that you ran ndv(column, precision) for all possible values (1-10).


http://gerrit.cloudera.org:8080/#/c/15997/13/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/15997/13/be/src/exprs/aggregate-functions-ir.cc@1441
PS13, Line 1441: ComputeSizeOfIntermediateTypeForNDV
this needs documentation.


http://gerrit.cloudera.org:8080/#/c/15997/13/be/src/exprs/aggregate-functions-ir.cc@1468
PS13, Line 1468:   int precision = log2(hll_len);
is there a reason this needs to be re-computed during each update?


http://gerrit.cloudera.org:8080/#/c/15997/13/be/src/exprs/aggregate-functions.h
File be/src/exprs/aggregate-functions.h:

http://gerrit.cloudera.org:8080/#/c/15997/13/be/src/exprs/aggregate-functions.h@196
PS13, Line 196: HLL_PRECISION = 10; // default precision
probably worth just renaming this to DEFAULT_HLL_PRECISION to keep it 
consistent with the MIN/MAX_HLL_PRECISION variables. you can remove the 
"default precision" comment as well.


http://gerrit.cloudera.org:8080/#/c/15997/13/be/src/exprs/aggregate-functions.h@203
PS13, Line 203: HLL_LEN
same here, rename to DEFAULT_HLL_LEN


http://gerrit.cloudera.org:8080/#/c/15997/9/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java:

http://gerrit.cloudera.org:8080/#/c/15997/9/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java@592
PS9, Line 592:       if (fn_ == null)
> Can you please explain?
it should be:

 if (fn_ == null) {
   throw new AnalysisException(
       "A suitable intermediate data type can not be found for the second 
parameter "
        + children_.get(1).toSql() + " in NDV()");
 }

notice how there are curly braces around the body of the if statement


http://gerrit.cloudera.org:8080/#/c/15997/9/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java:

http://gerrit.cloudera.org:8080/#/c/15997/9/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@352
PS9, Line 352: HLL_UPDATE_SYMBOL_T
> Change to HLL_UPDATE_SYMBOL_TWO_ARGS.
I think it can be more descriptive still. Something like 
HLL_UPDATE_SYMBOL_WITH_PRECISION would be better.


http://gerrit.cloudera.org:8080/#/c/15997/13/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java:

http://gerrit.cloudera.org:8080/#/c/15997/13/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@65
PS13, Line 65: ArrayList
should be List instead of ArrayList: 
https://stackoverflow.com/questions/2279030/type-list-vs-type-arraylist-in-java


http://gerrit.cloudera.org:8080/#/c/15997/9/tests/query_test/test_aggregation.py
File tests/query_test/test_aggregation.py:

http://gerrit.cloudera.org:8080/#/c/15997/9/tests/query_test/test_aggregation.py@318
PS9, Line 318: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
> These values numerate over all the columns in the select list.
ahh I thought this was the same as the list on line 301, but they are different

but yeah u should use xrange here and above instead since its much more concise.



--
To view, visit http://gerrit.cloudera.org:8080/15997
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I48a4517bd0959f7021143073d37505a46c551a58
Gerrit-Change-Number: 15997
Gerrit-PatchSet: 13
Gerrit-Owner: Qifan Chen <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>
Gerrit-Reviewer: Sahil Takiar <[email protected]>
Gerrit-Comment-Date: Fri, 05 Jun 2020 19:32:21 +0000
Gerrit-HasComments: Yes

Reply via email to