Taras Bobrovytsky has posted comments on this change. ( http://gerrit.cloudera.org:8080/6023 )
Change subject: IMPALA-4848: Add WIDTH_BUCKET() function ...................................................................... Patch Set 10: (19 comments) http://gerrit.cloudera.org:8080/#/c/6023/10//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/6023/10//COMMIT_MSG@10 PS10, Line 10: width_bucket(expr decimal, min_val decimal, max_val decimal, num_buckets int) nit: long line http://gerrit.cloudera.org:8080/#/c/6023/10//COMMIT_MSG@13 PS10, Line 13: is divided into num_buckets buckets having identical sizes. This function nit: long line http://gerrit.cloudera.org:8080/#/c/6023/9/be/src/exprs/math-functions-ir.cc File be/src/exprs/math-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/6023/9/be/src/exprs/math-functions-ir.cc@514 PS9, Line 514: dist_from_min = expr.template Subtract<int128_t>(input_scale, min_range, input_scale, > Right, this overflow won't happen without overflowing the subtraction on l If we never expect this to happen, make this a DCHECK and add a comment. http://gerrit.cloudera.org:8080/#/c/6023/9/be/src/exprs/math-functions-ir.cc@557 PS9, Line 557: return BigIntVal::null(); > Moved it above. I think it's bad practice to have checks "for safety". It makes the code slower and harder to reason about for people who are reading the code. If something is impossible, we should not be checking for it. Add a DCHECK instead. http://gerrit.cloudera.org:8080/#/c/6023/10/be/src/exprs/math-functions-ir.cc File be/src/exprs/math-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/6023/10/be/src/exprs/math-functions-ir.cc@438 PS10, Line 438: buckets nit: should be num_buckets instead of buckets http://gerrit.cloudera.org:8080/#/c/6023/10/be/src/exprs/math-functions-ir.cc@444 PS10, Line 444: result nit: "results" http://gerrit.cloudera.org:8080/#/c/6023/10/be/src/exprs/math-functions-ir.cc@455 PS10, Line 455: a nit: "an overflow" http://gerrit.cloudera.org:8080/#/c/6023/10/be/src/exprs/math-functions-ir.cc@482 PS10, Line 482: if (min_range >= max_range) { UNLIKELY http://gerrit.cloudera.org:8080/#/c/6023/10/be/src/exprs/math-functions-ir.cc@483 PS10, Line 483: ctx->SetError("Lower bound cannot be greater than or equal to the upper bound"); Add an end to end test for each error message. http://gerrit.cloudera.org:8080/#/c/6023/10/be/src/exprs/math-functions-ir.cc@487 PS10, Line 487: if (expr < min_range) return 0; UNLIKELY http://gerrit.cloudera.org:8080/#/c/6023/10/be/src/exprs/math-functions-ir.cc@489 PS10, Line 489: if (expr >= max_range) { UNLIKELY http://gerrit.cloudera.org:8080/#/c/6023/10/be/src/exprs/math-functions-ir.cc@504 PS10, Line 504: input_scale, input_precision, input_scale, false, &overflow); UNLIKELY(overflow) http://gerrit.cloudera.org:8080/#/c/6023/10/be/src/exprs/math-functions-ir.cc@505 PS10, Line 505: f( I think this check needs to happen above the line 487. Imagine if this overflows, but expr < min_range, then we will return zero. If we increase expr to be equal to min_range, this function will start returning null. Add a test case for this. http://gerrit.cloudera.org:8080/#/c/6023/10/be/src/exprs/math-functions-ir.cc@507 PS10, Line 507: error_msg << "Overflow while evaluating the difference between min_range: " << Add an end to end test for this error message (and others). http://gerrit.cloudera.org:8080/#/c/6023/10/be/src/exprs/math-functions-ir.cc@518 PS10, Line 518: error_msg << "Overflow while evaluating the difference between expr: " << Add an end to end test for this error message. http://gerrit.cloudera.org:8080/#/c/6023/10/be/src/exprs/math-functions-ir.cc@536 PS10, Line 536: if (abs(range_size.value()) * DecimalUtil::GetScaleMultiplier<int128_t>(input_scale) < I am not convinced this check is correct. In which case will this check be false? http://gerrit.cloudera.org:8080/#/c/6023/10/be/src/exprs/math-functions-ir.cc@538 PS10, Line 538: needs_int256 = true; Is it possible to tell statically that needs_int256 is false? For example, by looking at the precision and scale of min_range and max_range. We want to order the checks from fastest to slowest. First, do the static check. If static check did not rule out int256, do the leading zero check. If the leading zero check did not rule out the need for int256, do this check. If All the checks did not rule out int256, then set needs_int256 to true. http://gerrit.cloudera.org:8080/#/c/6023/10/be/src/exprs/math-functions-ir.cc@566 PS10, Line 566: int128_t y = DecimalUtil::MultiplyByScale<int128_t>(range_size.value(), We need to DCHECK that it did not overflow here. (It should be easy to do if you rebase this patch because I modified this function recently and added an optional overflow dcheck) http://gerrit.cloudera.org:8080/#/c/6023/10/be/src/exprs/math-functions.h File be/src/exprs/math-functions.h: http://gerrit.cloudera.org:8080/#/c/6023/10/be/src/exprs/math-functions.h@115 PS10, Line 115: static BigIntVal WidthBucketImpl(FunctionContext* ctx,const T1& expr, Does this need to be declared in this file? Isn't it an internal function? -- To view, visit http://gerrit.cloudera.org:8080/6023 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I081bc916b1bef7b929ca161a9aade3b54c6b858f Gerrit-Change-Number: 6023 Gerrit-PatchSet: 10 Gerrit-Owner: anujphadke <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Michael Brown <[email protected]> Gerrit-Reviewer: Taras Bobrovytsky <[email protected]> Gerrit-Reviewer: anujphadke <[email protected]> Gerrit-Comment-Date: Tue, 06 Feb 2018 00:00:05 +0000 Gerrit-HasComments: Yes
