[Impala-ASF-CR] [WIP] IMPALA-2658: Extend the NDV function to accept a precision

Sahil Takiar (Code Review) Fri, 05 Jun 2020 14:42:12 -0700

Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15997 )


Change subject: [WIP] IMPALA-2658: Extend the NDV function to accept a precision
......................................................................


Patch Set 13:

(16 comments)

some more comments

http://gerrit.cloudera.org:8080/#/c/15997/13/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/15997/13/be/src/exprs/aggregate-functions-ir.cc@1452
PS13, Line 1452:   if (ctx->GetNumArgs() == 2) {
it seems this whole code block is only used to validate the input parameters 
via DCHECKS. so it will only be run in debug builds, in which case it should be 
surrounded by a #ifdef NDEBUG ... #endif block.


http://gerrit.cloudera.org:8080/#/c/15997/13/be/src/exprs/aggregate-functions-ir.cc@1457
PS13, Line 1457: if (int_val)
is this ever possible? can int_val be null if GetNumArgs() == 2?


http://gerrit.cloudera.org:8080/#/c/15997/13/be/src/exprs/aggregate-functions-ir.cc@1466
PS13, Line 1466:   int hll_len = dst->len;
this copies the int value from dst->len to hll_len, is this necessary? if not, 
then can the int just be a const reference?


http://gerrit.cloudera.org:8080/#/c/15997/13/be/src/exprs/aggregate-functions-ir.cc@1481
PS13, Line 1481: // Two input argument version, which calls the single input 
argument version
same comment as elsewhere


http://gerrit.cloudera.org:8080/#/c/15997/13/be/src/exprs/aggregate-functions-ir.cc@1510
PS13, Line 1510: // Two input argument version which calls the single input 
argument version.
should be more descriptive that just "Two input argument version"


http://gerrit.cloudera.org:8080/#/c/15997/13/be/src/exprs/aggregate-functions-ir.cc@1523
PS13, Line 1523:   DCHECK_EQ(dst->len, src.len);
is there a reason the order of these got switched?


http://gerrit.cloudera.org:8080/#/c/15997/13/be/src/exprs/aggregate-functions-ir.cc@1557
PS13, Line 1557: actual_harmonic_mean
it doesn't look like this variable exists anywhere, did you mean 'harmonic_mean'


http://gerrit.cloudera.org:8080/#/c/15997/13/be/src/exprs/aggregate-functions-ir.cc@1575
PS13, Line 1575:
nit: extra new line


http://gerrit.cloudera.org:8080/#/c/15997/13/be/src/exprs/aggregate-functions-ir.cc@2467
PS13, Line 2467: // Two input argument version.
this could be more descriptive. what does the two input argument version do? 
saying something like "HLL version that accepts a precision value" would be 
more informative.


http://gerrit.cloudera.org:8080/#/c/15997/13/be/src/exprs/aggregate-functions.h
File be/src/exprs/aggregate-functions.h:

http://gerrit.cloudera.org:8080/#/c/15997/13/be/src/exprs/aggregate-functions.h@192
PS13, Line 192:   /// 2) HyperLogLog in Practice (paper from google with some 
improvements)
probably worth mentioning either here or elsewhere that these functions take in 
an optional precision. there should be some description of what the precision 
means and the range of acceptable values.


http://gerrit.cloudera.org:8080/#/c/15997/13/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java:

http://gerrit.cloudera.org:8080/#/c/15997/13/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@65
PS13, Line 65: HashMap
should be Map instead of HashMap


http://gerrit.cloudera.org:8080/#/c/15997/13/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@999
PS13, Line 999: fakedHllIntermediateType
it might be cleaner to just use the defaultHllIntermediateType here instead of 
creating a fake one


http://gerrit.cloudera.org:8080/#/c/15997/13/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@1009
PS13, Line 1009: ArrayList
should be declared as a "List".


http://gerrit.cloudera.org:8080/#/c/15997/13/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@1013
PS13, Line 1013:         ndvList.add(AggregateFunction.createBuiltin(db, "ndv",
               :             Lists.newArrayList(t, Type.INT), Type.BIGINT, 
hllIntermediateType,
               :             prefix + 
"7HllInitEPN10impala_udf15FunctionContextEPNS1_9StringValE",
               :             prefix + HLL_UPDATE_SYMBOL_TWO_ARGS.get(t),
               :             prefix + 
"8HllMergeEPN10impala_udf15FunctionContextERKNS1_9StringValEPS4_",
               :             null,
               :             prefix + 
"11HllFinalizeEPN10impala_udf15FunctionContextERKNS1_9StringValE",
               :             true, false, true));
it looks like this is pretty much a duplicate of the code on lines 998-1005, 
its probably work abstracting this into a private method that takes the 
intermediate type as a parameter.
even better would be to replace the call on lines 983-990 with this common 
method, you would have to make the update symbol a parameter as well.


http://gerrit.cloudera.org:8080/#/c/15997/13/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@1022
PS13, Line 1022:       builtinNDVs_.put(t, ndvList);
isn't of adding all of these into a HashMap, what about just creating the 
AggregateFunction at runtime? e.g. instead of pre-populating all the possible 
AggregateFunctions for all possible intermediate types and all possible data 
types, just add a method here that creates it on-demand? FunctionCallExpr can 
then just call that method? I think its simpler than using this Map<Type, 
List<AggregateFunction>> object; there shouldn't be much overhead in just 
re-creating the object time since its small.


http://gerrit.cloudera.org:8080/#/c/15997/13/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@1365
PS13, Line 1365: ArrayList
nit: should be List



--
To view, visit http://gerrit.cloudera.org:8080/15997
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I48a4517bd0959f7021143073d37505a46c551a58
Gerrit-Change-Number: 15997
Gerrit-PatchSet: 13
Gerrit-Owner: Qifan Chen <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>
Gerrit-Reviewer: Sahil Takiar <[email protected]>
Gerrit-Comment-Date: Fri, 05 Jun 2020 21:41:30 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] [WIP] IMPALA-2658: Extend the NDV function to accept a precision

Reply via email to