[Impala-ASF-CR] [WIP] IMPALA-2658: Extend the NDV function to accept a precision
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/15997 ) Change subject: [WIP] IMPALA-2658: Extend the NDV function to accept a precision .. Patch Set 13: (21 comments) Done and please review one more time. http://gerrit.cloudera.org:8080/#/c/15997/13/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/15997/13/be/src/exprs/aggregate-functions-ir.cc@1441 PS13, Line 1441: ComputeSizeOfIntermediateTypeForNDV > this needs documentation. Done http://gerrit.cloudera.org:8080/#/c/15997/13/be/src/exprs/aggregate-functions-ir.cc@1452 PS13, Line 1452: if (ctx->GetNumArgs() == 2) { > it seems this whole code block is only used to validate the input parameter Wrapped the entire block in #ifndef NDEBUG. http://gerrit.cloudera.org:8080/#/c/15997/13/be/src/exprs/aggregate-functions-ir.cc@1457 PS13, Line 1457: if (int_val) > is this ever possible? can int_val be null if GetNumArgs() == 2? Yes. In a parallel plan, this method can be called for the merge() part of the expression evaluation, where the actual number of arguments is 1 (the intermediate result). http://gerrit.cloudera.org:8080/#/c/15997/13/be/src/exprs/aggregate-functions-ir.cc@1466 PS13, Line 1466: int hll_len = dst->len; > this copies the int value from dst->len to hll_len, is this necessary? if n Done http://gerrit.cloudera.org:8080/#/c/15997/13/be/src/exprs/aggregate-functions-ir.cc@1468 PS13, Line 1468: int precision = log2(hll_len); > is there a reason this needs to be re-computed during each update? I improved it a little bit, from log2(x) to BitUtil::CountTrailingZeros((unsigned int)size, sizeof(size) * CHAR_BIT). We could cache the precision value during HllInit() call to a replace and reuse. But I have not found the place to do so. If we use ctx, does it mean we need to add a new data member? http://gerrit.cloudera.org:8080/#/c/15997/13/be/src/exprs/aggregate-functions-ir.cc@1481 PS13, Line 1481: // Two input argument version, which calls the single input argument version > same comment as elsewhere Same answer as above. http://gerrit.cloudera.org:8080/#/c/15997/13/be/src/exprs/aggregate-functions-ir.cc@1510 PS13, Line 1510: // Two input argument version which calls the single input argument version. > should be more descriptive that just "Two input argument version" Done http://gerrit.cloudera.org:8080/#/c/15997/13/be/src/exprs/aggregate-functions-ir.cc@1523 PS13, Line 1523: DCHECK_EQ(dst->len, src.len); > is there a reason the order of these got switched? Made the src.len the first argument in the DCHECK_EQ() which reads more natural. http://gerrit.cloudera.org:8080/#/c/15997/13/be/src/exprs/aggregate-functions-ir.cc@1557 PS13, Line 1557: actual_harmonic_mean > it doesn't look like this variable exists anywhere, did you mean 'harmonic_ Done http://gerrit.cloudera.org:8080/#/c/15997/13/be/src/exprs/aggregate-functions-ir.cc@1575 PS13, Line 1575: > nit: extra new line Done http://gerrit.cloudera.org:8080/#/c/15997/13/be/src/exprs/aggregate-functions-ir.cc@2467 PS13, Line 2467: // Two input argument version. > this could be more descriptive. what does the two input argument version do Done http://gerrit.cloudera.org:8080/#/c/15997/13/be/src/exprs/aggregate-functions.h File be/src/exprs/aggregate-functions.h: http://gerrit.cloudera.org:8080/#/c/15997/13/be/src/exprs/aggregate-functions.h@192 PS13, Line 192: /// 2) HyperLogLog in Practice (paper from google with some improvements) > probably worth mentioning either here or elsewhere that these functions tak Done http://gerrit.cloudera.org:8080/#/c/15997/13/be/src/exprs/aggregate-functions.h@196 PS13, Line 196: HLL_PRECISION = 10; // default precision > probably worth just renaming this to DEFAULT_HLL_PRECISION to keep it consi Done. I hesitated a little bit last on making the change, as this will affect several other modules (e.g., sampled_ndv, and test modules). http://gerrit.cloudera.org:8080/#/c/15997/13/be/src/exprs/aggregate-functions.h@203 PS13, Line 203: HLL_LEN > same here, rename to DEFAULT_HLL_LEN Done http://gerrit.cloudera.org:8080/#/c/15997/13/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java: http://gerrit.cloudera.org:8080/#/c/15997/13/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@65 PS13, Line 65: HashMap > should be Map instead of HashMap Done http://gerrit.cloudera.org:8080/#/c/15997/13/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@65 PS13, Line 65: ArrayList > should be List instead of ArrayList: https://stackoverflow.com/questions/22 Done http://gerrit.cloudera.org:8080/#/c/15997/13/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@999 PS13, Line 999: fakedHllIntermediateType > it might be cleaner to just use the defaultHllIntermediateType here
[Impala-ASF-CR] [WIP] IMPALA-2658: Extend the NDV function to accept a precision
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/15997 ) Change subject: [WIP] IMPALA-2658: Extend the NDV function to accept a precision .. Patch Set 13: (16 comments) some more comments http://gerrit.cloudera.org:8080/#/c/15997/13/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/15997/13/be/src/exprs/aggregate-functions-ir.cc@1452 PS13, Line 1452: if (ctx->GetNumArgs() == 2) { it seems this whole code block is only used to validate the input parameters via DCHECKS. so it will only be run in debug builds, in which case it should be surrounded by a #ifdef NDEBUG ... #endif block. http://gerrit.cloudera.org:8080/#/c/15997/13/be/src/exprs/aggregate-functions-ir.cc@1457 PS13, Line 1457: if (int_val) is this ever possible? can int_val be null if GetNumArgs() == 2? http://gerrit.cloudera.org:8080/#/c/15997/13/be/src/exprs/aggregate-functions-ir.cc@1466 PS13, Line 1466: int hll_len = dst->len; this copies the int value from dst->len to hll_len, is this necessary? if not, then can the int just be a const reference? http://gerrit.cloudera.org:8080/#/c/15997/13/be/src/exprs/aggregate-functions-ir.cc@1481 PS13, Line 1481: // Two input argument version, which calls the single input argument version same comment as elsewhere http://gerrit.cloudera.org:8080/#/c/15997/13/be/src/exprs/aggregate-functions-ir.cc@1510 PS13, Line 1510: // Two input argument version which calls the single input argument version. should be more descriptive that just "Two input argument version" http://gerrit.cloudera.org:8080/#/c/15997/13/be/src/exprs/aggregate-functions-ir.cc@1523 PS13, Line 1523: DCHECK_EQ(dst->len, src.len); is there a reason the order of these got switched? http://gerrit.cloudera.org:8080/#/c/15997/13/be/src/exprs/aggregate-functions-ir.cc@1557 PS13, Line 1557: actual_harmonic_mean it doesn't look like this variable exists anywhere, did you mean 'harmonic_mean' http://gerrit.cloudera.org:8080/#/c/15997/13/be/src/exprs/aggregate-functions-ir.cc@1575 PS13, Line 1575: nit: extra new line http://gerrit.cloudera.org:8080/#/c/15997/13/be/src/exprs/aggregate-functions-ir.cc@2467 PS13, Line 2467: // Two input argument version. this could be more descriptive. what does the two input argument version do? saying something like "HLL version that accepts a precision value" would be more informative. http://gerrit.cloudera.org:8080/#/c/15997/13/be/src/exprs/aggregate-functions.h File be/src/exprs/aggregate-functions.h: http://gerrit.cloudera.org:8080/#/c/15997/13/be/src/exprs/aggregate-functions.h@192 PS13, Line 192: /// 2) HyperLogLog in Practice (paper from google with some improvements) probably worth mentioning either here or elsewhere that these functions take in an optional precision. there should be some description of what the precision means and the range of acceptable values. http://gerrit.cloudera.org:8080/#/c/15997/13/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java: http://gerrit.cloudera.org:8080/#/c/15997/13/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@65 PS13, Line 65: HashMap should be Map instead of HashMap http://gerrit.cloudera.org:8080/#/c/15997/13/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@999 PS13, Line 999: fakedHllIntermediateType it might be cleaner to just use the defaultHllIntermediateType here instead of creating a fake one http://gerrit.cloudera.org:8080/#/c/15997/13/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@1009 PS13, Line 1009: ArrayList should be declared as a "List". http://gerrit.cloudera.org:8080/#/c/15997/13/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@1013 PS13, Line 1013: ndvList.add(AggregateFunction.createBuiltin(db, "ndv", : Lists.newArrayList(t, Type.INT), Type.BIGINT, hllIntermediateType, : prefix + "7HllInitEPN10impala_udf15FunctionContextEPNS1_9StringValE", : prefix + HLL_UPDATE_SYMBOL_TWO_ARGS.get(t), : prefix + "8HllMergeEPN10impala_udf15FunctionContextERKNS1_9StringValEPS4_", : null, : prefix + "11HllFinalizeEPN10impala_udf15FunctionContextERKNS1_9StringValE", : true, false, true)); it looks like this is pretty much a duplicate of the code on lines 998-1005, its probably work abstracting this into a private method that takes the intermediate type as a parameter. even better would be to replace the call on lines 983-990 with this common method, you would have to make the update symbol a parameter as well. http://gerrit.cloudera.org:8080/#/c/15997/13/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@1022 PS13, Line 1022:
[Impala-ASF-CR] [WIP] IMPALA-2658: Extend the NDV function to accept a precision
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/15997 ) Change subject: [WIP] IMPALA-2658: Extend the NDV function to accept a precision .. Patch Set 13: (12 comments) http://gerrit.cloudera.org:8080/#/c/15997/7//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15997/7//COMMIT_MSG@7 PS7, Line 7: [WIP] IMPALA-2658: Extend the NDV function to accept a precision > overall, I think this commit message is a bit verbose in terms of describin ping - I think this still needs to be addressed http://gerrit.cloudera.org:8080/#/c/15997/13//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15997/13//COMMIT_MSG@41 PS13, Line 41: Testing: since you ran all core tests you should add an entry here that says "Ran core tests" http://gerrit.cloudera.org:8080/#/c/15997/13//COMMIT_MSG@44 PS13, Line 44: 2 Run unit tests against other tables such as tpcds.store_sales which unit tests are you talking about? if these are already include in "core" tests, you don't need to include this. http://gerrit.cloudera.org:8080/#/c/15997/13//COMMIT_MSG@47 PS13, Line 47: select ndv(c_name, 1) "one", ndv(c_name, 2) two, ndv(c_name, 3) three, : ndv(c_name, 4) as four, ndv(c_name, 5) as five, ndv(c_name, 6) as six, : ndv(c_name, 7) as seven, ndv(c_name, 8) as eight, ndv(c_name, 9) as nine, : ndv(c_name, 10) as ten : from tpch.customer; : : select ndv(ss_sold_time_sk, 1) "one", ndv(ss_sold_time_sk, 2) two, : ndv(ss_sold_time_sk, 3) three, ndv(ss_sold_time_sk, 4) as four, : ndv(ss_sold_time_sk, 5) as five, ndv(ss_sold_time_sk, 6) as six, : ndv(ss_sold_time_sk, 7) as seven, ndv(ss_sold_time_sk, 8) as eight, : ndv(ss_sold_time_sk, 9) as nine, ndv(ss_sold_time_sk, 10) as ten : from tpcds.store_sales; I think adding these in the commit message makes it too verbose. You just need to mention that you ran ndv(column, precision) for all possible values (1-10). http://gerrit.cloudera.org:8080/#/c/15997/13/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/15997/13/be/src/exprs/aggregate-functions-ir.cc@1441 PS13, Line 1441: ComputeSizeOfIntermediateTypeForNDV this needs documentation. http://gerrit.cloudera.org:8080/#/c/15997/13/be/src/exprs/aggregate-functions-ir.cc@1468 PS13, Line 1468: int precision = log2(hll_len); is there a reason this needs to be re-computed during each update? http://gerrit.cloudera.org:8080/#/c/15997/13/be/src/exprs/aggregate-functions.h File be/src/exprs/aggregate-functions.h: http://gerrit.cloudera.org:8080/#/c/15997/13/be/src/exprs/aggregate-functions.h@196 PS13, Line 196: HLL_PRECISION = 10; // default precision probably worth just renaming this to DEFAULT_HLL_PRECISION to keep it consistent with the MIN/MAX_HLL_PRECISION variables. you can remove the "default precision" comment as well. http://gerrit.cloudera.org:8080/#/c/15997/13/be/src/exprs/aggregate-functions.h@203 PS13, Line 203: HLL_LEN same here, rename to DEFAULT_HLL_LEN http://gerrit.cloudera.org:8080/#/c/15997/9/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java: http://gerrit.cloudera.org:8080/#/c/15997/9/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java@592 PS9, Line 592: if (fn_ == null) > Can you please explain? it should be: if (fn_ == null) { throw new AnalysisException( "A suitable intermediate data type can not be found for the second parameter " + children_.get(1).toSql() + " in NDV()"); } notice how there are curly braces around the body of the if statement http://gerrit.cloudera.org:8080/#/c/15997/9/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java: http://gerrit.cloudera.org:8080/#/c/15997/9/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@352 PS9, Line 352: HLL_UPDATE_SYMBOL_T > Change to HLL_UPDATE_SYMBOL_TWO_ARGS. I think it can be more descriptive still. Something like HLL_UPDATE_SYMBOL_WITH_PRECISION would be better. http://gerrit.cloudera.org:8080/#/c/15997/13/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java: http://gerrit.cloudera.org:8080/#/c/15997/13/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@65 PS13, Line 65: ArrayList should be List instead of ArrayList: https://stackoverflow.com/questions/2279030/type-list-vs-type-arraylist-in-java http://gerrit.cloudera.org:8080/#/c/15997/9/tests/query_test/test_aggregation.py File tests/query_test/test_aggregation.py: http://gerrit.cloudera.org:8080/#/c/15997/9/tests/query_test/test_aggregation.py@318 PS9, Line 318: 0, 1, 2,
[Impala-ASF-CR] [WIP] IMPALA-2658: Extend the NDV function to accept a precision
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15997 ) Change subject: [WIP] IMPALA-2658: Extend the NDV function to accept a precision .. Patch Set 13: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6222/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15997 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I48a4517bd0959f7021143073d37505a46c551a58 Gerrit-Change-Number: 15997 Gerrit-PatchSet: 13 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Fri, 05 Jun 2020 15:34:27 + Gerrit-HasComments: No
[Impala-ASF-CR] [WIP] IMPALA-2658: Extend the NDV function to accept a precision
Qifan Chen has uploaded a new patch set (#13). ( http://gerrit.cloudera.org:8080/15997 ) Change subject: [WIP] IMPALA-2658: Extend the NDV function to accept a precision .. [WIP] IMPALA-2658: Extend the NDV function to accept a precision This work addresses the current limitation in NDV function by extending the function to take the 2nd integer-typed argument, which must be an abstract value in the range of 1 to 10. This abstract value specifies a real precision value used in the HLL algorithm for the function. Front end work: 1. Add a new template ndv function in builtin db that takes two arguments. 2. Verify that the 2nd argument of a NDV() is an integer literal in [1,10]; 3. A new method to implement the mapping of the abstract value to the hll precision (the real work is TBD); 4. The length of the intermediate data type is computed based on the actual hll precision. When the 2nd argument is missing, the length is 1024 as in the current implementation; 5. The 2nd argument, if present, will be carried over all the way to the BE. Back end work: 1. Remove the hardcoded precision (10) from these functions: AggregateFunctions::HllInit(), AggregateFunctions::HllUpdate(), AggregateFunctions::HllMerge(), AggregateFunctions::HllFinalEstimate(), AggregateFunctions::HllFinalize(), HllEstimateBias(); 2. Instead, the actual precision is computed from the length of the intermediate data type as log2(hll_len); 3. Verify that the length of the intermediate data type is correct according to the 2nd argument (if present). Testing: 1 Add a regression test (test_ndv)) in TestAggregationQueries section to computes ndv() for every supported Impala data type. 2 Run unit tests against other tables such as tpcds.store_sales and tpch.customer in both serial and parallel plan settings. select ndv(c_name, 1) "one", ndv(c_name, 2) two, ndv(c_name, 3) three, ndv(c_name, 4) as four, ndv(c_name, 5) as five, ndv(c_name, 6) as six, ndv(c_name, 7) as seven, ndv(c_name, 8) as eight, ndv(c_name, 9) as nine, ndv(c_name, 10) as ten from tpch.customer; select ndv(ss_sold_time_sk, 1) "one", ndv(ss_sold_time_sk, 2) two, ndv(ss_sold_time_sk, 3) three, ndv(ss_sold_time_sk, 4) as four, ndv(ss_sold_time_sk, 5) as five, ndv(ss_sold_time_sk, 6) as six, ndv(ss_sold_time_sk, 7) as seven, ndv(ss_sold_time_sk, 8) as eight, ndv(ss_sold_time_sk, 9) as nine, ndv(ss_sold_time_sk, 10) as ten from tpcds.store_sales; Perf: TBD Change-Id: I48a4517bd0959f7021143073d37505a46c551a58 --- M be/src/common/logging.h M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M tests/query_test/test_aggregation.py 6 files changed, 302 insertions(+), 37 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/15997/13 -- To view, visit http://gerrit.cloudera.org:8080/15997 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I48a4517bd0959f7021143073d37505a46c551a58 Gerrit-Change-Number: 15997 Gerrit-PatchSet: 13 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar
[Impala-ASF-CR] [WIP] IMPALA-2658: Extend the NDV function to accept a precision
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/15997 ) Change subject: [WIP] IMPALA-2658: Extend the NDV function to accept a precision .. Patch Set 11: (13 comments) Fix style issues with the newly added test. http://gerrit.cloudera.org:8080/#/c/15997/11/tests/query_test/test_aggregation.py File tests/query_test/test_aggregation.py: http://gerrit.cloudera.org:8080/#/c/15997/11/tests/query_test/test_aggregation.py@87 PS11, Line 87: [2, 9, 96, 988, 980, 1000, 944, 1030, 1020, 990, 1010, 957, 1030, 1027, 9845, 9898], > flake8: E122 continuation line missing indentation or outdented Done http://gerrit.cloudera.org:8080/#/c/15997/11/tests/query_test/test_aggregation.py@88 PS11, Line 88: [2, 9, 97, 957, 1008, 1016, 1005, 963, 994, 993, 1018, 1004, 963, 1014, 10210, 10280], > flake8: E122 continuation line missing indentation or outdented Done http://gerrit.cloudera.org:8080/#/c/15997/11/tests/query_test/test_aggregation.py@89 PS11, Line 89: [2, 9, 98, 977, 1024, 1020, 975, 977, 1002, 991, 994, 1006, 977, 999, 10118, 9923], > flake8: E122 continuation line missing indentation or outdented Done http://gerrit.cloudera.org:8080/#/c/15997/11/tests/query_test/test_aggregation.py@90 PS11, Line 90: [2, 9, 99, 986, 1009, 1011, 994, 980, 997, 994, 1002, 997, 980, 988, 10148, 9987], > flake8: E122 continuation line missing indentation or outdented Done http://gerrit.cloudera.org:8080/#/c/15997/11/tests/query_test/test_aggregation.py@91 PS11, Line 91: [2, 9, 99, 995, 996, 1000, 998, 988, 995, 999, 997, 999, 988, 979, 9974, 9960], > flake8: E122 continuation line missing indentation or outdented Done http://gerrit.cloudera.org:8080/#/c/15997/11/tests/query_test/test_aggregation.py@92 PS11, Line 92: [2, 9, 99, 998, 1005, 999, 1003, 994, 1000, 993, 999, 998, 994, 992, 9899, 9941], > flake8: E122 continuation line missing indentation or outdented Done http://gerrit.cloudera.org:8080/#/c/15997/11/tests/query_test/test_aggregation.py@93 PS11, Line 93: [2, 9, 99, 993, 1001, 1007, 1000, 998, 1002, 997, 999, 998, 998, 999, 9923, 9931], > flake8: E122 continuation line missing indentation or outdented Done http://gerrit.cloudera.org:8080/#/c/15997/11/tests/query_test/test_aggregation.py@94 PS11, Line 94: [2, 9, 99, 994, 998, 1002, 1002, 999, 998, 999, 997, 1000, 999, 997, 9937, 9973], > flake8: E122 continuation line missing indentation or outdented Done http://gerrit.cloudera.org:8080/#/c/15997/11/tests/query_test/test_aggregation.py@95 PS11, Line 95: [2, 9, 99, 995, 997, 998, 1001, 999, 1001, 996, 997, 1000, 999, 998, 9989, 9981], > flake8: E122 continuation line missing indentation or outdented Done http://gerrit.cloudera.org:8080/#/c/15997/11/tests/query_test/test_aggregation.py@96 PS11, Line 96: [2, 9, 99, 998, 998, 997, 999, 998, 1000, 998, 1000, 998, 998, 1000, 1, 10003] > flake8: E122 continuation line missing indentation or outdented Done http://gerrit.cloudera.org:8080/#/c/15997/11/tests/query_test/test_aggregation.py@315 PS11, Line 315: # Verify that each ndv() value (one per column for a total of 11) is identical > line has trailing whitespace Done http://gerrit.cloudera.org:8080/#/c/15997/11/tests/query_test/test_aggregation.py@316 PS11, Line 316: # to the corresponding known value. Since NDV() invokes Hash64() hash function > line has trailing whitespace Done http://gerrit.cloudera.org:8080/#/c/15997/11/tests/query_test/test_aggregation.py@319 PS11, Line 319: - > flake8: E226 missing whitespace around arithmetic operator Done -- To view, visit http://gerrit.cloudera.org:8080/15997 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I48a4517bd0959f7021143073d37505a46c551a58 Gerrit-Change-Number: 15997 Gerrit-PatchSet: 11 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Fri, 05 Jun 2020 14:29:42 + Gerrit-HasComments: Yes
[Impala-ASF-CR] [WIP] IMPALA-2658: Extend the NDV function to accept a precision
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15997 ) Change subject: [WIP] IMPALA-2658: Extend the NDV function to accept a precision .. Patch Set 11: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6218/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15997 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I48a4517bd0959f7021143073d37505a46c551a58 Gerrit-Change-Number: 15997 Gerrit-PatchSet: 11 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Fri, 05 Jun 2020 01:16:18 + Gerrit-HasComments: No
[Impala-ASF-CR] [WIP] IMPALA-2658: Extend the NDV function to accept a precision
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15997 ) Change subject: [WIP] IMPALA-2658: Extend the NDV function to accept a precision .. Patch Set 11: (16 comments) http://gerrit.cloudera.org:8080/#/c/15997/11/tests/query_test/test_aggregation.py File tests/query_test/test_aggregation.py: http://gerrit.cloudera.org:8080/#/c/15997/11/tests/query_test/test_aggregation.py@87 PS11, Line 87: [2, 9, 96, 988, 980, 1000, 944, 1030, 1020, 990, 1010, 957, 1030, 1027, 9845, 9898], flake8: E122 continuation line missing indentation or outdented http://gerrit.cloudera.org:8080/#/c/15997/11/tests/query_test/test_aggregation.py@88 PS11, Line 88: [2, 9, 97, 957, 1008, 1016, 1005, 963, 994, 993, 1018, 1004, 963, 1014, 10210, 10280], flake8: E122 continuation line missing indentation or outdented http://gerrit.cloudera.org:8080/#/c/15997/11/tests/query_test/test_aggregation.py@89 PS11, Line 89: [2, 9, 98, 977, 1024, 1020, 975, 977, 1002, 991, 994, 1006, 977, 999, 10118, 9923], flake8: E122 continuation line missing indentation or outdented http://gerrit.cloudera.org:8080/#/c/15997/11/tests/query_test/test_aggregation.py@90 PS11, Line 90: [2, 9, 99, 986, 1009, 1011, 994, 980, 997, 994, 1002, 997, 980, 988, 10148, 9987], flake8: E122 continuation line missing indentation or outdented http://gerrit.cloudera.org:8080/#/c/15997/11/tests/query_test/test_aggregation.py@91 PS11, Line 91: [2, 9, 99, 995, 996, 1000, 998, 988, 995, 999, 997, 999, 988, 979, 9974, 9960], flake8: E122 continuation line missing indentation or outdented http://gerrit.cloudera.org:8080/#/c/15997/11/tests/query_test/test_aggregation.py@92 PS11, Line 92: [2, 9, 99, 998, 1005, 999, 1003, 994, 1000, 993, 999, 998, 994, 992, 9899, 9941], flake8: E122 continuation line missing indentation or outdented http://gerrit.cloudera.org:8080/#/c/15997/11/tests/query_test/test_aggregation.py@93 PS11, Line 93: [2, 9, 99, 993, 1001, 1007, 1000, 998, 1002, 997, 999, 998, 998, 999, 9923, 9931], flake8: E122 continuation line missing indentation or outdented http://gerrit.cloudera.org:8080/#/c/15997/11/tests/query_test/test_aggregation.py@94 PS11, Line 94: [2, 9, 99, 994, 998, 1002, 1002, 999, 998, 999, 997, 1000, 999, 997, 9937, 9973], flake8: E122 continuation line missing indentation or outdented http://gerrit.cloudera.org:8080/#/c/15997/11/tests/query_test/test_aggregation.py@95 PS11, Line 95: [2, 9, 99, 995, 997, 998, 1001, 999, 1001, 996, 997, 1000, 999, 998, 9989, 9981], flake8: E122 continuation line missing indentation or outdented http://gerrit.cloudera.org:8080/#/c/15997/11/tests/query_test/test_aggregation.py@96 PS11, Line 96: [2, 9, 99, 998, 998, 997, 999, 998, 1000, 998, 1000, 998, 998, 1000, 1, 10003] flake8: E122 continuation line missing indentation or outdented http://gerrit.cloudera.org:8080/#/c/15997/11/tests/query_test/test_aggregation.py@96 PS11, Line 96: flake8: E241 multiple spaces after ',' http://gerrit.cloudera.org:8080/#/c/15997/11/tests/query_test/test_aggregation.py@315 PS11, Line 315: flake8: W291 trailing whitespace http://gerrit.cloudera.org:8080/#/c/15997/11/tests/query_test/test_aggregation.py@315 PS11, Line 315: # Verify that each ndv() value (one per column for a total of 11) is identical line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/15997/11/tests/query_test/test_aggregation.py@316 PS11, Line 316: flake8: W291 trailing whitespace http://gerrit.cloudera.org:8080/#/c/15997/11/tests/query_test/test_aggregation.py@316 PS11, Line 316: # to the corresponding known value. Since NDV() invokes Hash64() hash function line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/15997/11/tests/query_test/test_aggregation.py@319 PS11, Line 319: - flake8: E226 missing whitespace around arithmetic operator -- To view, visit http://gerrit.cloudera.org:8080/15997 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I48a4517bd0959f7021143073d37505a46c551a58 Gerrit-Change-Number: 15997 Gerrit-PatchSet: 11 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Fri, 05 Jun 2020 00:30:47 + Gerrit-HasComments: Yes
[Impala-ASF-CR] [WIP] IMPALA-2658: Extend the NDV function to accept a precision
Qifan Chen has uploaded a new patch set (#11). ( http://gerrit.cloudera.org:8080/15997 ) Change subject: [WIP] IMPALA-2658: Extend the NDV function to accept a precision .. [WIP] IMPALA-2658: Extend the NDV function to accept a precision This work addresses the current limitation in NDV function by extending the function to take the 2nd integer-typed argument, which must be an abstract value in the range of 1 to 10. This abstract value specifies a real precision value used in the HLL algorithm for the function. Front end work: 1. Add a new template ndv function in builtin db that takes two arguments. 2. Verify that the 2nd argument of a NDV() is an integer literal in [1,10]; 3. A new method to implement the mapping of the abstract value to the hll precision (the real work is TBD); 4. The length of the intermediate data type is computed based on the actual hll precision. When the 2nd argument is missing, the length is 1024 as in the current implementation; 5. The 2nd argument, if present, will be carried over all the way to the BE. Back end work: 1. Remove the hardcoded precision (10) from these functions: AggregateFunctions::HllInit(), AggregateFunctions::HllUpdate(), AggregateFunctions::HllMerge(), AggregateFunctions::HllFinalEstimate(), AggregateFunctions::HllFinalize(), HllEstimateBias(); 2. Instead, the actual precision is computed from the length of the intermediate data type as log2(hll_len); 3. Verify that the length of the intermediate data type is correct according to the 2nd argument (if present). Work TDB: 1. Add unit tests; 2. Determine the final mapping of the 10 abstract values to 10 possible Hll precisions. Change-Id: I48a4517bd0959f7021143073d37505a46c551a58 --- M be/src/common/logging.h M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M tests/query_test/test_aggregation.py 6 files changed, 302 insertions(+), 37 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/15997/11 -- To view, visit http://gerrit.cloudera.org:8080/15997 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I48a4517bd0959f7021143073d37505a46c551a58 Gerrit-Change-Number: 15997 Gerrit-PatchSet: 11 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar