[Impala-ASF-CR] IMPALA-2658: Extend the NDV function to accept a precision

Sahil Takiar (Code Review) Wed, 24 Jun 2020 19:23:24 -0700

Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15997 )


Change subject: IMPALA-2658: Extend the NDV function to accept a precision
......................................................................


Patch Set 39: Code-Review+1

(3 comments)

Looks good to me! Will upgrade to a +2 once the final few comments have been 
addressed.

Great work on this and thanks for your patience!

http://gerrit.cloudera.org:8080/#/c/15997/39//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15997/39//COMMIT_MSG@48
PS39, Line 48:    abs(<true_unique_value> - <estimated_unique_value>)
             :       / <true_unique_value>.
you don't need to fix this since, but for future reference code / mathematical 
formulas don't typically abide by the 72 character line wrap limit


http://gerrit.cloudera.org:8080/#/c/15997/39/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/15997/39/be/src/exprs/aggregate-functions-ir.cc@1466
PS39, Line 1466: #ifndef NDEBUG
               :   // Verify that the length of the intermediate data type is 
computable from
               :   // the precision as represented in the 2nd argument.
               :   if (ctx->GetNumArgs() == 2) {
               :     IntVal* int_val = 
reinterpret_cast<IntVal*>(ctx->GetConstantArg(1));
               :
               :     // In parallel plan, the merge() function takes only one 
argument which
               :     // is the intermediate data. Avoid check in such cases.
               :     if (int_val) {
               :       DCHECK_EQ(dst->len, 
ComputeHllLengthFromScale(int_val->val));
               :     }
               :   }
               : #endif
looking at this again, I think it would be best to move this into a static 
method and call it inside a DCHECK. since I made you move this into a NDEBUG 
block in the first place, I just did the re-factoring locally and tested that 
it works on DEBUG and RELEASE builds. here is the diff:

 diff --git a/be/src/exprs/aggregate-functions-ir.cc 
b/be/src/exprs/aggregate-functions-ir.cc
index f8ce81d..2f2dc9d 100644
--- a/be/src/exprs/aggregate-functions-ir.cc
+++ b/be/src/exprs/aggregate-functions-ir.cc
@@ -1443,13 +1443,11 @@ static inline int ComputePrecisionFromScale(int scale) {
   return scale + 8;
 }

-#ifndef NDEBUG
 // Compute the precision from a scale value. This method must be identical
 // to function ComputeHllLengthFromScale() defined in FunctionCallExpr.java.
 static inline int ComputeHllLengthFromScale(int scale) {
   return 1 << ComputePrecisionFromScale(scale);
 }
-#endif

 // Compute the precision from a hll length as log2(len) or # of trailing
 // zeros in length. For example, when len is 1024 = 2^10 = 0b10000000000,
@@ -1458,24 +1456,26 @@ static inline int ComputePrecisionFromHllLength(int 
hll_len) {
   return BitUtil::CountTrailingZeros((unsigned int)hll_len, sizeof(hll_len) * 
CHAR_BIT);
 }

-void AggregateFunctions::HllInit(FunctionContext* ctx, StringVal* dst) {
-  // The HLL functions use a preallocated FIXED_UDA_INTERMEDIATE intermediate 
value.
-  DCHECK_IN_RANGE(dst->len, MIN_HLL_LEN, MAX_HLL_LEN);
-  memset(dst->ptr, 0, dst->len);
-
-#ifndef NDEBUG
-  // Verify that the length of the intermediate data type is computable from
-  // the precision as represented in the 2nd argument.
+// Verify that the length of the intermediate data type is computable from
+// the precision as represented in the 2nd argument.
+static inline bool CheckHllArgs(FunctionContext* ctx, StringVal* dst) {
   if (ctx->GetNumArgs() == 2) {
     IntVal* int_val = reinterpret_cast<IntVal*>(ctx->GetConstantArg(1));

     // In parallel plan, the merge() function takes only one argument which
     // is the intermediate data. Avoid check in such cases.
     if (int_val) {
-      DCHECK_EQ(dst->len, ComputeHllLengthFromScale(int_val->val));
+      return dst->len == ComputeHllLengthFromScale(int_val->val);
     }
   }
-#endif
+  return true;
+}
+
+void AggregateFunctions::HllInit(FunctionContext* ctx, StringVal* dst) {
+  // The HLL functions use a preallocated FIXED_UDA_INTERMEDIATE intermediate 
value.
+  DCHECK_IN_RANGE(dst->len, MIN_HLL_LEN, MAX_HLL_LEN);
+  memset(dst->ptr, 0, dst->len);
+  DCHECK(CheckHllArgs(ctx, dst));
 }

 // Implementation for update functions. It accepts a precision.


http://gerrit.cloudera.org:8080/#/c/15997/39/be/src/exprs/aggregate-functions-ir.cc@1483
PS39, Line 1483: IR_ALWAYS_INLINE
can you add the following method comment here and on the other HllUpdate: "// 
Always inline in IR so that constants can be replaced."



--
To view, visit http://gerrit.cloudera.org:8080/15997
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I48a4517bd0959f7021143073d37505a46c551a58
Gerrit-Change-Number: 15997
Gerrit-PatchSet: 39
Gerrit-Owner: Qifan Chen <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>
Gerrit-Reviewer: Sahil Takiar <[email protected]>
Gerrit-Comment-Date: Thu, 25 Jun 2020 02:22:37 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2658: Extend the NDV function to accept a precision

Reply via email to