[Impala-ASF-CR] IMPALA-4848: Add WIDTH BUCKET() function

Taras Bobrovytsky (Code Review) Mon, 05 Feb 2018 16:00:22 -0800

Taras Bobrovytsky has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/6023 )


Change subject: IMPALA-4848: Add WIDTH_BUCKET() function
......................................................................


Patch Set 10:

(19 comments)

http://gerrit.cloudera.org:8080/#/c/6023/10//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/6023/10//COMMIT_MSG@10
PS10, Line 10: width_bucket(expr decimal, min_val decimal, max_val decimal, 
num_buckets int)
nit: long line


http://gerrit.cloudera.org:8080/#/c/6023/10//COMMIT_MSG@13
PS10, Line 13: is divided into num_buckets buckets having identical sizes. This 
function
nit: long line


http://gerrit.cloudera.org:8080/#/c/6023/9/be/src/exprs/math-functions-ir.cc
File be/src/exprs/math-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/6023/9/be/src/exprs/math-functions-ir.cc@514
PS9, Line 514:   dist_from_min = expr.template Subtract<int128_t>(input_scale, 
min_range, input_scale,
> Right, this overflow won't happen without overflowing the subtraction on l
If we never expect this to happen, make this a DCHECK and add a comment.


http://gerrit.cloudera.org:8080/#/c/6023/9/be/src/exprs/math-functions-ir.cc@557
PS9, Line 557:       return BigIntVal::null();
> Moved it above.
I think it's bad practice to have checks "for safety". It makes the code slower 
and harder to reason about for people who are reading the code. If something is 
impossible, we should not be checking for it. Add a DCHECK instead.


http://gerrit.cloudera.org:8080/#/c/6023/10/be/src/exprs/math-functions-ir.cc
File be/src/exprs/math-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/6023/10/be/src/exprs/math-functions-ir.cc@438
PS10, Line 438: buckets
nit: should be num_buckets instead of buckets


http://gerrit.cloudera.org:8080/#/c/6023/10/be/src/exprs/math-functions-ir.cc@444
PS10, Line 444: result
nit: "results"


http://gerrit.cloudera.org:8080/#/c/6023/10/be/src/exprs/math-functions-ir.cc@455
PS10, Line 455: a
nit: "an overflow"


http://gerrit.cloudera.org:8080/#/c/6023/10/be/src/exprs/math-functions-ir.cc@482
PS10, Line 482:   if (min_range >= max_range) {
UNLIKELY


http://gerrit.cloudera.org:8080/#/c/6023/10/be/src/exprs/math-functions-ir.cc@483
PS10, Line 483:     ctx->SetError("Lower bound cannot be greater than or equal 
to the upper bound");
Add an end to end test for each error message.


http://gerrit.cloudera.org:8080/#/c/6023/10/be/src/exprs/math-functions-ir.cc@487
PS10, Line 487:   if (expr < min_range) return 0;
UNLIKELY


http://gerrit.cloudera.org:8080/#/c/6023/10/be/src/exprs/math-functions-ir.cc@489
PS10, Line 489:   if (expr >= max_range) {
UNLIKELY


http://gerrit.cloudera.org:8080/#/c/6023/10/be/src/exprs/math-functions-ir.cc@504
PS10, Line 504:       input_scale, input_precision, input_scale, false, 
&overflow);
UNLIKELY(overflow)


http://gerrit.cloudera.org:8080/#/c/6023/10/be/src/exprs/math-functions-ir.cc@505
PS10, Line 505: f(
I think this check needs to happen above the line 487. Imagine if this 
overflows, but expr < min_range, then we will return zero. If we increase expr 
to be equal to min_range, this function will start returning null.

Add a test case for this.


http://gerrit.cloudera.org:8080/#/c/6023/10/be/src/exprs/math-functions-ir.cc@507
PS10, Line 507:     error_msg << "Overflow while evaluating the difference 
between min_range: " <<
Add an end to end test for this error message (and others).


http://gerrit.cloudera.org:8080/#/c/6023/10/be/src/exprs/math-functions-ir.cc@518
PS10, Line 518:     error_msg << "Overflow while evaluating the difference 
between expr: " <<
Add an end to end test for this error message.


http://gerrit.cloudera.org:8080/#/c/6023/10/be/src/exprs/math-functions-ir.cc@536
PS10, Line 536:   if (abs(range_size.value()) * 
DecimalUtil::GetScaleMultiplier<int128_t>(input_scale) <
I am not convinced this check is correct. In which case will this check be 
false?


http://gerrit.cloudera.org:8080/#/c/6023/10/be/src/exprs/math-functions-ir.cc@538
PS10, Line 538:     needs_int256 = true;
Is it possible to tell statically that needs_int256 is false? For example, by 
looking at the precision and scale of min_range and max_range.

We want to order the checks from fastest to slowest. First, do the static 
check. If static check did not rule out int256, do the leading zero check. If 
the leading zero check did not rule out the need for int256, do this check. If 
All the checks did not rule out int256, then set needs_int256 to true.


http://gerrit.cloudera.org:8080/#/c/6023/10/be/src/exprs/math-functions-ir.cc@566
PS10, Line 566:     int128_t y = 
DecimalUtil::MultiplyByScale<int128_t>(range_size.value(),
We need to DCHECK that it did not overflow here. (It should be easy to do if 
you rebase this patch because I modified this function recently and added an 
optional overflow dcheck)


http://gerrit.cloudera.org:8080/#/c/6023/10/be/src/exprs/math-functions.h
File be/src/exprs/math-functions.h:

http://gerrit.cloudera.org:8080/#/c/6023/10/be/src/exprs/math-functions.h@115
PS10, Line 115:   static BigIntVal WidthBucketImpl(FunctionContext* ctx,const 
T1& expr,
Does this need to be declared in this file? Isn't it an internal function?



--
To view, visit http://gerrit.cloudera.org:8080/6023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I081bc916b1bef7b929ca161a9aade3b54c6b858f
Gerrit-Change-Number: 6023
Gerrit-PatchSet: 10
Gerrit-Owner: anujphadke <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Michael Brown <[email protected]>
Gerrit-Reviewer: Taras Bobrovytsky <[email protected]>
Gerrit-Reviewer: anujphadke <[email protected]>
Gerrit-Comment-Date: Tue, 06 Feb 2018 00:00:05 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4848: Add WIDTH BUCKET() function

Reply via email to