Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/15997 )
Change subject: [WIP] IMPALA-2658: Extend the NDV function to accept a precision ...................................................................... Patch Set 13: (12 comments) http://gerrit.cloudera.org:8080/#/c/15997/7//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15997/7//COMMIT_MSG@7 PS7, Line 7: [WIP] IMPALA-2658: Extend the NDV function to accept a precision > overall, I think this commit message is a bit verbose in terms of describin ping - I think this still needs to be addressed http://gerrit.cloudera.org:8080/#/c/15997/13//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15997/13//COMMIT_MSG@41 PS13, Line 41: Testing: since you ran all core tests you should add an entry here that says "Ran core tests" http://gerrit.cloudera.org:8080/#/c/15997/13//COMMIT_MSG@44 PS13, Line 44: 2 Run unit tests against other tables such as tpcds.store_sales which unit tests are you talking about? if these are already include in "core" tests, you don't need to include this. http://gerrit.cloudera.org:8080/#/c/15997/13//COMMIT_MSG@47 PS13, Line 47: select ndv(c_name, 1) "one", ndv(c_name, 2) two, ndv(c_name, 3) three, : ndv(c_name, 4) as four, ndv(c_name, 5) as five, ndv(c_name, 6) as six, : ndv(c_name, 7) as seven, ndv(c_name, 8) as eight, ndv(c_name, 9) as nine, : ndv(c_name, 10) as ten : from tpch.customer; : : select ndv(ss_sold_time_sk, 1) "one", ndv(ss_sold_time_sk, 2) two, : ndv(ss_sold_time_sk, 3) three, ndv(ss_sold_time_sk, 4) as four, : ndv(ss_sold_time_sk, 5) as five, ndv(ss_sold_time_sk, 6) as six, : ndv(ss_sold_time_sk, 7) as seven, ndv(ss_sold_time_sk, 8) as eight, : ndv(ss_sold_time_sk, 9) as nine, ndv(ss_sold_time_sk, 10) as ten : from tpcds.store_sales; I think adding these in the commit message makes it too verbose. You just need to mention that you ran ndv(column, precision) for all possible values (1-10). http://gerrit.cloudera.org:8080/#/c/15997/13/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/15997/13/be/src/exprs/aggregate-functions-ir.cc@1441 PS13, Line 1441: ComputeSizeOfIntermediateTypeForNDV this needs documentation. http://gerrit.cloudera.org:8080/#/c/15997/13/be/src/exprs/aggregate-functions-ir.cc@1468 PS13, Line 1468: int precision = log2(hll_len); is there a reason this needs to be re-computed during each update? http://gerrit.cloudera.org:8080/#/c/15997/13/be/src/exprs/aggregate-functions.h File be/src/exprs/aggregate-functions.h: http://gerrit.cloudera.org:8080/#/c/15997/13/be/src/exprs/aggregate-functions.h@196 PS13, Line 196: HLL_PRECISION = 10; // default precision probably worth just renaming this to DEFAULT_HLL_PRECISION to keep it consistent with the MIN/MAX_HLL_PRECISION variables. you can remove the "default precision" comment as well. http://gerrit.cloudera.org:8080/#/c/15997/13/be/src/exprs/aggregate-functions.h@203 PS13, Line 203: HLL_LEN same here, rename to DEFAULT_HLL_LEN http://gerrit.cloudera.org:8080/#/c/15997/9/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java: http://gerrit.cloudera.org:8080/#/c/15997/9/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java@592 PS9, Line 592: if (fn_ == null) > Can you please explain? it should be: if (fn_ == null) { throw new AnalysisException( "A suitable intermediate data type can not be found for the second parameter " + children_.get(1).toSql() + " in NDV()"); } notice how there are curly braces around the body of the if statement http://gerrit.cloudera.org:8080/#/c/15997/9/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java: http://gerrit.cloudera.org:8080/#/c/15997/9/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@352 PS9, Line 352: HLL_UPDATE_SYMBOL_T > Change to HLL_UPDATE_SYMBOL_TWO_ARGS. I think it can be more descriptive still. Something like HLL_UPDATE_SYMBOL_WITH_PRECISION would be better. http://gerrit.cloudera.org:8080/#/c/15997/13/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java: http://gerrit.cloudera.org:8080/#/c/15997/13/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@65 PS13, Line 65: ArrayList should be List instead of ArrayList: https://stackoverflow.com/questions/2279030/type-list-vs-type-arraylist-in-java http://gerrit.cloudera.org:8080/#/c/15997/9/tests/query_test/test_aggregation.py File tests/query_test/test_aggregation.py: http://gerrit.cloudera.org:8080/#/c/15997/9/tests/query_test/test_aggregation.py@318 PS9, Line 318: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] > These values numerate over all the columns in the select list. ahh I thought this was the same as the list on line 301, but they are different but yeah u should use xrange here and above instead since its much more concise. -- To view, visit http://gerrit.cloudera.org:8080/15997 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I48a4517bd0959f7021143073d37505a46c551a58 Gerrit-Change-Number: 15997 Gerrit-PatchSet: 13 Gerrit-Owner: Qifan Chen <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Qifan Chen <[email protected]> Gerrit-Reviewer: Sahil Takiar <[email protected]> Gerrit-Comment-Date: Fri, 05 Jun 2020 19:32:21 +0000 Gerrit-HasComments: Yes
