[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts
Zach Amsden has posted comments on this change. Change subject: IMPALA-2020: Add rounding for decimal casts .. Patch Set 16: (1 comment) After fighting the compiler for over an hour to figure out why our operator<< was not being called, I found this little gem in glog/logging.h and since abandoned getting DCHECK_EQ / GT to work. / Build the error message string. template std::string* MakeCheckOpString(const t1& v1, const t2& v2, const char* names) { // It means that we cannot use stl_logging if compiler doesn't // support using expression for operator. // TODO(hamaji): Figure out a way to fix. #if 1 using ::operator<<; #endif std::strstream ss; ss << names << " (" << v1 << " vs. " << v2 << ")"; return new std::string(ss.str(), ss.pcount()); } http://gerrit.cloudera.org:8080/#/c/5951/16//COMMIT_MSG Commit Message: Line 64: Query: select sum(cast(l_extendedprice as bigint)) from > please pare down the commit msg, in particular get rid of the verbatim shel Done -- To view, visit http://gerrit.cloudera.org:8080/5951 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810 Gerrit-PatchSet: 16 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-2020: Add rounding for decimal casts .. Patch Set 16: (1 comment) http://gerrit.cloudera.org:8080/#/c/5951/16//COMMIT_MSG Commit Message: Line 64: Query: select sum(cast(l_extendedprice as bigint)) from please pare down the commit msg, in particular get rid of the verbatim shell output (a table with the numbers is sufficient to get the point across). -- To view, visit http://gerrit.cloudera.org:8080/5951 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810 Gerrit-PatchSet: 16 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts
Zach Amsden has posted comments on this change. Change subject: IMPALA-2020: Add rounding for decimal casts .. Patch Set 10: (1 comment) I am trying to get the DCHECK_EQ to work with int128_t, for which we do provide a definition, but gcc keeps complaining about an ambiguous overload. http://gerrit.cloudera.org:8080/#/c/5951/10/be/src/runtime/decimal-value.inline.h File be/src/runtime/decimal-value.inline.h: PS10, Line 38: if (round) { : // Decimal V2 behavior : : // Multiply the double by the scale. : // : // TODO: this could be made more precise by doing the computation in : // 64 bit with 128 bit immediate result instead of doing an additional : // multiply in 53-bit double precision floating point : : d *= DecimalUtil::GetScaleMultiplier(scale); : d = std::round(d); : const T max_value = DecimalUtil::GetScaleMultiplier(precision); : if (abs(d) >= max_value) { : *overflow = true; : return DecimalValue(); : } : : // Return the rounded integer part. : return DecimalValue(static_cast(d)); : } else { : // TODO: IMPALA-4924: remove DECIMAL V1 code : : // Check overflow. : const T max_value = DecimalUtil::GetScaleMultiplier(precision - scale); : if (abs(d) >= max_value) { : *overflow = true; : return DecimalValue(); : } : : // Multiply the double by the scale. : d *= DecimalUtil::GetScaleMultiplier(scale); : : // Truncate and just take the integer part. : return DecimalValue(static_cast(d)); : } > If it won't break DECIMAL_V1, may as well merge the two paths and add the t It looks like it will work and it doesn't change the speed. -- To view, visit http://gerrit.cloudera.org:8080/5951 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts
Zach Amsden has uploaded a new patch set (#16). Change subject: IMPALA-2020: Add rounding for decimal casts .. IMPALA-2020: Add rounding for decimal casts This change adds support for DECIMAL_V2 rounding behavior for both DECIMAL to INT and DOUBLE to DECIMAL casts. The round behavior implemented for exact halves is round halves away from zero (e.g (0.5 -> 1) and (-0.5 -> -1)). Testing: Added expr-test and decimal-test test coverage as well as manual testing. I tried to update the expr benchmark to get some kind of results but the benchmark is pretty bit-rotted. It was throwing JNI exceptions. Fixed up the JNI init call, but there is still a lot of work to do to get this back in a runnable state. Even with the hack to get at the RuntimeContext, we end up getting null derefs due to the slot descriptor table not being initialized. I have decided to wait on expanding the python test until the bugs with overflow are fixed, which will make it easier to test sane behavior. [localhost:21000] > select cast(0.5 AS int); +--+ | cast(0.5 as int) | +--+ | 0| +--+ Fetched 1 row(s) in 0.01s [localhost:21000] > select cast(cast(0.5999 as float) as decimal(5,1)); +-+ | cast(cast(0.5999 as float) as decimal(5,1)) | +-+ | 0.5 | +-+ Fetched 1 row(s) in 0.01s [localhost:21000] > set decimal_v2=1; DECIMAL_V2 set to 1 [localhost:21000] > select cast(0.5 AS int); +--+ | cast(0.5 as int) | +--+ | 1| +--+ Fetched 1 row(s) in 0.01s [localhost:21000] > select cast(cast(0.5999 as float) as decimal(5,1)); +-+ | cast(cast(0.5999 as float) as decimal(5,1)) | +-+ | 0.6 | +-+ Fetched 1 row(s) in 0.01s Performance summary. In all cases I have tried multiple times and taken the fastest query results. Old version, head at 815c76f9cbbe6585ebed961da506fc54ce2ef4e3: [localhost:21000] > select sum(cast(l_extendedprice as bigint)) from tpch10_parquet.lineitem; Query: select sum(cast(l_extendedprice as bigint)) from tpch10_parquet.lineitem Query submitted at: 2017-02-21 19:31:32 (Coordinator: http://impala-dev:25000) Query progress can be monitored at: http://impala-dev:25000/query_plan?query_id=bf4d06517e3093ca:b053953b +--+ | sum(cast(l_extendedprice as bigint)) | +--+ | 2293784575265| +--+ Fetched 1 row(s) in 0.53s With this change, and decimal_v2 off: [localhost:21000] > select sum(cast(l_extendedprice as bigint)) from tpch10_parquet.lineitem; Query: select sum(cast(l_extendedprice as bigint)) from tpch10_parquet.lineitem Query submitted at: 2017-02-21 19:06:05 (Coordinator: http://impala-dev:25000) Query progress can be monitored at: http://impala-dev:25000/query_plan?query_id=e54f6fc2c21d63d0:da7b6baa +--+ | sum(cast(l_extendedprice as bigint)) | +--+ | 2293784575265| +--+ Fetched 1 row(s) in 0.52s Note that there is some noise / instability in these results and across invocations there is quite a bit of variance. Still we appear not to have regressed. With decimal V2 enabled, we loose some performance due to rounding. [localhost:21000] > set decimal_v2=1; DECIMAL_V2 set to 1 [localhost:21000] > select sum(cast(l_extendedprice as bigint)) from tpch10_parquet.lineitem; Query: select sum(cast(l_extendedprice as bigint)) from tpch10_parquet.lineitem Query submitted at: 2017-02-21 21:04:45 (Coordinator: http://impala-dev:25000) Query progress can be monitored at: http://impala-dev:25000/query_plan?query_id=444a15a709f0672:74197af +--+ | sum(cast(l_extendedprice as bigint)) | +--+ | 2293814088985| +--+ Fetched 1 row(s) in 0.63s So we're about 20% slower. The variance is quite a lot so this is not a scientific number, but the trend is maintained. So we have some work to do to get this back. Casting from double seems to be roughly at parity: Old version, head at 815c76f9cbbe6585ebed961da506fc54ce2ef4e3: [localhost:21000] > select sum(cast(cast(l_extendedprice as double) as decimal(14,2))) from tpch10_parquet.lineitem; Query: select sum(cast(cast(l_extendedprice as double) as decimal(14,2))) from tpch10_parquet.lineitem Query
[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts
Zach Amsden has posted comments on this change. Change subject: IMPALA-2020: Add rounding for decimal casts .. Patch Set 12: (8 comments) http://gerrit.cloudera.org:8080/#/c/5951/15//COMMIT_MSG Commit Message: Line 122 > Sure, we can try to get this back in follow on commits. Will check http://gerrit.cloudera.org:8080/#/c/5951/15/be/src/benchmarks/expr-benchmark.cc File be/src/benchmarks/expr-benchmark.cc: PS15, Line 218: BENCHMARK("strncmp1", "'abcdefghi > Can you please add a sample output like other benchmark functions ? I can't - with these changes, this is in a slightly better state, but the benchmark still doesn't run, it segfaults almost immediately due to a NULL descriptor table. After spending the better part of a day attempting to fix it, I've had to shelve this for now. Considering that there are serious issues running this through codegen anyway, I'm not sure what to do with these changes. http://gerrit.cloudera.org:8080/#/c/5951/15/be/src/runtime/decimal-test.cc File be/src/runtime/decimal-test.cc: PS15, Line 169: -.1 > -0.1. Done PS15, Line 224: Decimal4Value::FromDouble(t4, 0.4, true, ); > Isn't a duplicate of the test above ? This was supposed to be negative :) http://gerrit.cloudera.org:8080/#/c/5951/15/be/src/runtime/decimal-value.h File be/src/runtime/decimal-value.h: PS15, Line 52: truncating digits that : /// cannot be represented or rounding if desired > rounding to the closest integer if 'round' is true. Truncate the decimal pl Done http://gerrit.cloudera.org:8080/#/c/5951/10/be/src/runtime/decimal-value.inline.h File be/src/runtime/decimal-value.inline.h: PS10, Line 38: if (round) { : // Decimal V2 behavior : : // Multiply the double by the scale. : // : // Some quick research shows us that 10**38 in binary has exactly 53 : // significant nonzero bits. Coincidentally, double precision floating : // point gives us 53 bits for the mantissa. TODO: validate that the : // values produced by the template are both correct, constant and that : // the multiply does not lose precision. : : d *= DecimalUtil::GetScaleMultiplier(scale); : d = std::round(d); : const T max_value = DecimalUtil::GetScaleMultiplier(precision); : if (max_value < 0 || UNLIKELY(std::isnan(d)) || UNLIKELY(std::abs(d) >= max_value)) { : *overflow = true; : return DecimalValue(); : } : : // Return the rounded integer part. : return DecimalValue(static_cast(d)); : } else { : // TODO: IMPALA-4924: remove DECIMAL V1 code : : // Check overflow. : const T max_value = DecimalUtil::GetScaleMultiplier(precision - scale); : if (abs(d) >= max_value) { : *overflow = true; : return DecimalValue(); : } : : // Multiply the double by the scale. : d *= DecimalUtil::GetScaleMultiplier(scale); : : > Are there particular cases you have in mind which may break if we use V2's None I can come up with right now, but this was before I noticed the bit pattern for 10^37 fits exactly in a a double. Now I'm reasonably sure it is safe. http://gerrit.cloudera.org:8080/#/c/5951/12/be/src/runtime/decimal-value.inline.h File be/src/runtime/decimal-value.inline.h: PS12, Line 52: max_value < 0 > Yes, if it's not possible (i.e. only happens due to a software bug somewher Done http://gerrit.cloudera.org:8080/#/c/5951/15/be/src/runtime/decimal-value.inline.h File be/src/runtime/decimal-value.inline.h: PS15, Line 132: > nit: long line fixed -- To view, visit http://gerrit.cloudera.org:8080/5951 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810 Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts
Dan Hecht has posted comments on this change. Change subject: IMPALA-2020: Add rounding for decimal casts .. Patch Set 12: (1 comment) http://gerrit.cloudera.org:8080/#/c/5951/12/be/src/runtime/decimal-value.inline.h File be/src/runtime/decimal-value.inline.h: PS12, Line 52: max_value < 0 > If we cast to a decimal type whose underlying width is not large enough to Yes, if it's not possible (i.e. only happens due to a software bug somewhere else), let's make it a DCHECK. -- To view, visit http://gerrit.cloudera.org:8080/5951 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810 Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts
Dan Hecht has posted comments on this change. Change subject: IMPALA-2020: Add rounding for decimal casts .. Patch Set 15: (1 comment) http://gerrit.cloudera.org:8080/#/c/5951/15//COMMIT_MSG Commit Message: Line 122: do to get this back. Sure, we can try to get this back in follow on commits. But the other thing I wanted to check was whether perf for v2=0 regresses at all with this change. -- To view, visit http://gerrit.cloudera.org:8080/5951 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810 Gerrit-PatchSet: 15 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts
Michael Ho has posted comments on this change. Change subject: IMPALA-2020: Add rounding for decimal casts .. Patch Set 15: (6 comments) http://gerrit.cloudera.org:8080/#/c/5951/15/be/src/benchmarks/expr-benchmark.cc File be/src/benchmarks/expr-benchmark.cc: PS15, Line 218: Benchmark* BenchmarkDecimalCast() { Can you please add a sample output like other benchmark functions ? http://gerrit.cloudera.org:8080/#/c/5951/15/be/src/runtime/decimal-test.cc File be/src/runtime/decimal-test.cc: PS15, Line 169: -.1 -0.1. Also may help to add cases which cause overflow due to rounding. Decimal16Value::FromDouble(t3, 0.9, true, ); Decimal16Value::FromDouble(t3, 0.9, false, ); PS15, Line 224: Decimal4Value::FromDouble(t4, 0.4, true, ); Isn't a duplicate of the test above ? http://gerrit.cloudera.org:8080/#/c/5951/15/be/src/runtime/decimal-value.h File be/src/runtime/decimal-value.h: PS15, Line 52: truncating digits that : /// cannot be represented or rounding if desired rounding to the closest integer if 'round' is true. Truncate the decimal places otherwise. http://gerrit.cloudera.org:8080/#/c/5951/10/be/src/runtime/decimal-value.inline.h File be/src/runtime/decimal-value.inline.h: PS10, Line 38: if (round) { : // Decimal V2 behavior : : // Multiply the double by the scale. : // : // Some quick research shows us that 10**38 in binary has exactly 53 : // significant nonzero bits. Coincidentally, double precision floating : // point gives us 53 bits for the mantissa. TODO: validate that the : // values produced by the template are both correct, constant and that : // the multiply does not lose precision. : : d *= DecimalUtil::GetScaleMultiplier(scale); : d = std::round(d); : const T max_value = DecimalUtil::GetScaleMultiplier(precision); : DCHECK(max_value > 0); // no DCHECK_GT because of int128_t : if (UNLIKELY(std::isnan(d)) || UNLIKELY(std::abs(d) >= max_value)) { : *overflow = true; : return DecimalValue(); : } : : // Return the rounded integer part. : return DecimalValue(static_cast(d)); : } else { : // TODO: IMPALA-4924: remove DECIMAL V1 code : : // Check overflow. : const T max_value = DecimalUtil::GetScaleMultiplier(precision - scale); : if (abs(d) >= max_value) { : *overflow = true; : return DecimalValue(); : } : : // Multiply the double by the scale. : d *= DecimalUtil::GetScaleMultiplier(scale); : > Sort of. I didn't want to change the original at all in this change though Are there particular cases you have in mind which may break if we use V2's logic for both version ? http://gerrit.cloudera.org:8080/#/c/5951/15/be/src/runtime/decimal-value.inline.h File be/src/runtime/decimal-value.inline.h: PS15, Line 132: || nit: long line -- To view, visit http://gerrit.cloudera.org:8080/5951 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810 Gerrit-PatchSet: 15 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts
Zach Amsden has uploaded a new patch set (#15). Change subject: IMPALA-2020: Add rounding for decimal casts .. IMPALA-2020: Add rounding for decimal casts This change adds support for DECIMAL_V2 rounding behavior for both DECIMAL to INT and DOUBLE to DECIMAL casts. The round behavior implemented for exact halves is round halves away from zero (e.g (0.5 -> 1) and (-0.5 -> -1)). Testing: Added expr-test and decimal-test test coverage as well as manual testing. I tried to update the expr benchmark to get some kind of results but the benchmark is pretty bit-rotted. It was throwing JNI exceptions. Fixed up the JNI init call, but there is still a lot of work to do to get this back in a runnable state. Even with the hack to get at the RuntimeContext, we end up getting null derefs due to the slot descriptor table not being initialized. I have decided to wait on expanding the python test until the bugs with overflow are fixed, which will make it easier to test sane behavior. [localhost:21000] > select cast(0.5 AS int); +--+ | cast(0.5 as int) | +--+ | 0| +--+ Fetched 1 row(s) in 0.01s [localhost:21000] > select cast(cast(0.5999 as float) as decimal(5,1)); +-+ | cast(cast(0.5999 as float) as decimal(5,1)) | +-+ | 0.5 | +-+ Fetched 1 row(s) in 0.01s [localhost:21000] > set decimal_v2=1; DECIMAL_V2 set to 1 [localhost:21000] > select cast(0.5 AS int); +--+ | cast(0.5 as int) | +--+ | 1| +--+ Fetched 1 row(s) in 0.01s [localhost:21000] > select cast(cast(0.5999 as float) as decimal(5,1)); +-+ | cast(cast(0.5999 as float) as decimal(5,1)) | +-+ | 0.6 | +-+ Fetched 1 row(s) in 0.01s Note there is no free lunch: [localhost:21000] > set decimal_v2=0; DECIMAL_V2 set to 0 [localhost:21000] > select sum(cast(l_extendedprice as bigint)) from tpch10_parquet.lineitem; Query: select sum(cast(l_extendedprice as bigint)) from tpch10_parquet.lineitem Query submitted at: 2017-02-18 01:16:49 (Coordinator: http://impala-dev:25000) Query progress can be monitored at: http://impala-dev:25000/query_plan?query_id=c546f270316b2176:820bb667 +--+ | sum(cast(l_extendedprice as bigint)) | +--+ | 2293784575265| +--+ Fetched 1 row(s) in 0.76s [localhost:21000] > select sum(cast(l_extendedprice as bigint)) from tpch10_parquet.lineitem; Query: select sum(cast(l_extendedprice as bigint)) from tpch10_parquet.lineitem Query submitted at: 2017-02-18 01:16:52 (Coordinator: http://impala-dev:25000) Query progress can be monitored at: http://impala-dev:25000/query_plan?query_id=524bf2693849ce99:be999b03 +--+ | sum(cast(l_extendedprice as bigint)) | +--+ | 2293784575265| +--+ Fetched 1 row(s) in 0.73s [localhost:21000] > set decimal_v2=1; DECIMAL_V2 set to 1 [localhost:21000] > select sum(cast(l_extendedprice as bigint)) from tpch10_parquet.lineitem; Query: select sum(cast(l_extendedprice as bigint)) from tpch10_parquet.lineitem Query submitted at: 2017-02-18 01:16:59 (Coordinator: http://impala-dev:25000) Query progress can be monitored at: http://impala-dev:25000/query_plan?query_id=ca4f8413061576d9:2389ccde +--+ | sum(cast(l_extendedprice as bigint)) | +--+ | 2293814088985| +--+ Fetched 1 row(s) in 0.85s [localhost:21000] > select sum(cast(l_extendedprice as bigint)) from tpch10_parquet.lineitem; Query: select sum(cast(l_extendedprice as bigint)) from tpch10_parquet.lineitem Query submitted at: 2017-02-18 01:17:02 (Coordinator: http://impala-dev:25000) Query progress can be monitored at: http://impala-dev:25000/query_plan?query_id=4d4b5f9f181306c7:f6726c6 +--+ | sum(cast(l_extendedprice as bigint)) | +--+ | 2293814088985| +--+ Fetched 1 row(s) in 0.96s So we're about 20% slower. The variance is quite a lot so this is not a scientific number, but the trend is maintained. So we have some work to do to get this back. Casting from double seems to be roughly at parity: [localhost:21000] > set decimal_v2=0;
[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts
Zach Amsden has uploaded a new patch set (#13). Change subject: IMPALA-2020: Add rounding for decimal casts .. IMPALA-2020: Add rounding for decimal casts This change adds support for DECIMAL_V2 rounding behavior for both DECIMAL to INT and DOUBLE to DECIMAL casts. The round behavior implemented for exact halves is round halves away from zero (e.g (0.5 -> 1) and (-0.5 -> -1)). Testing: Added expr-test and decimal-test test coverage as well as manual testing. I tried to update the expr benchmark to get some kind of results but the benchmark is pretty bit-rotted. It was throwing JNI exceptions. Fixed up the JNI init call, but there is still a lot of work to do to get this back in a runnable state. Even with the hack to get at the RuntimeContext, we end up getting null derefs due to the slot descriptor table not being initialized. [localhost:21000] > select cast(0.5 AS int); +--+ | cast(0.5 as int) | +--+ | 0| +--+ Fetched 1 row(s) in 0.01s [localhost:21000] > select cast(cast(0.5999 as float) as decimal(5,1)); +-+ | cast(cast(0.5999 as float) as decimal(5,1)) | +-+ | 0.5 | +-+ Fetched 1 row(s) in 0.01s [localhost:21000] > set decimal_v2=1; DECIMAL_V2 set to 1 [localhost:21000] > select cast(0.5 AS int); +--+ | cast(0.5 as int) | +--+ | 1| +--+ Fetched 1 row(s) in 0.01s [localhost:21000] > select cast(cast(0.5999 as float) as decimal(5,1)); +-+ | cast(cast(0.5999 as float) as decimal(5,1)) | +-+ | 0.6 | +-+ Fetched 1 row(s) in 0.01s Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810 --- M be/src/benchmarks/expr-benchmark.cc M be/src/exprs/decimal-operators-ir.cc M be/src/exprs/expr-test.cc M be/src/exprs/expr.h M be/src/exprs/literal.cc M be/src/runtime/decimal-test.cc M be/src/runtime/decimal-value.h M be/src/runtime/decimal-value.inline.h M be/src/udf/udf.h 9 files changed, 467 insertions(+), 111 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/5951/13 -- To view, visit http://gerrit.cloudera.org:8080/5951 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810 Gerrit-PatchSet: 13 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts
Zach Amsden has posted comments on this change. Change subject: IMPALA-2020: Add rounding for decimal casts .. Patch Set 12: (2 comments) http://gerrit.cloudera.org:8080/#/c/5951/12/be/src/runtime/decimal-value.inline.h File be/src/runtime/decimal-value.inline.h: PS12, Line 49: double > shouldn't this be T? No, T may not have enough precision to hold the value of 10^scale - look at the old code, which also does the multiplication in double. PS12, Line 52: max_value < 0 > why is the max_value < 0 check needed? If we cast to a decimal type whose underlying width is not large enough to hold the precision, this will become true. Make it a DCHECK? This should be impossible, guaranteed by the type system. -- To view, visit http://gerrit.cloudera.org:8080/5951 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810 Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts
Dan Hecht has posted comments on this change. Change subject: IMPALA-2020: Add rounding for decimal casts .. Patch Set 11: (5 comments) > I have not updated the Python test yet, as that would require a > rebase. Python test and benchmark will come in next patch set, > this is just to give a clean slate for review. Thanks. Just some minor comments on the code. I think the benchmark is higher priority than the python test since we already have pretty good functional test coverage and so less likely to be surprised, whereas there could be a perf regression. We could even do the python test as a follow on commit if that is a fair amount of work (perhaps even after taking care of divide rounding). http://gerrit.cloudera.org:8080/#/c/5951/11/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: PS11, Line 1526: power(10, 3) - power(10, -1) at some point in the future, we may implement a decimal variant of POWER(), in which case these test cases would no longer exercise the CAST(double as decimal) path. So, how about either adding a check: // The CAST FLOAT -> DECIMAL test cases rely on power() returning // a double. If that changes, those test cases need to be updated. TestStringValue("typeOf(power(10,1))", "DOUBLE"); Or, we could add the explicit cast to double/float now. Line 1601: {{ false, -999, 3, 0 }}}, there all do DOUBLE -> DECIMAL. How about adding some FLOAT -> DECIMAL cases as well? http://gerrit.cloudera.org:8080/#/c/5951/11/be/src/runtime/decimal-value.h File be/src/runtime/decimal-value.h: PS11, Line 232: Optionally delete "Optionally" since it's not optional. PS11, Line 233: otherwise the result is truncated. delete since this doesn't do truncation http://gerrit.cloudera.org:8080/#/c/5951/11/be/src/runtime/decimal-value.inline.h File be/src/runtime/decimal-value.inline.h: PS11, Line 45: mantissaa mantissa -- To view, visit http://gerrit.cloudera.org:8080/5951 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts
Zach Amsden has posted comments on this change. Change subject: IMPALA-2020: Add rounding for decimal casts .. Patch Set 11: I have not updated the Python test yet, as that would require a rebase. Python test and benchmark will come in next patch set, this is just to give a clean slate for review. -- To view, visit http://gerrit.cloudera.org:8080/5951 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts
Zach Amsden has uploaded a new patch set (#11). Change subject: IMPALA-2020: Add rounding for decimal casts .. IMPALA-2020: Add rounding for decimal casts This change adds support for DECIMAL_V2 rounding behavior for both DECIMAL to INT and DOUBLE to DECIMAL casts. The round behavior implemented for exact halves is round halves away from zero (e.g (0.5 -> 1) and (-0.5 -> -1)). Testing: Added expr-test and decimal-test test coverage as well as manual testing. [localhost:21000] > select cast(0.5 AS int); +--+ | cast(0.5 as int) | +--+ | 0| +--+ Fetched 1 row(s) in 0.01s [localhost:21000] > select cast(cast(0.5999 as float) as decimal(5,1)); +-+ | cast(cast(0.5999 as float) as decimal(5,1)) | +-+ | 0.5 | +-+ Fetched 1 row(s) in 0.01s [localhost:21000] > set decimal_v2=1; DECIMAL_V2 set to 1 [localhost:21000] > select cast(0.5 AS int); +--+ | cast(0.5 as int) | +--+ | 1| +--+ Fetched 1 row(s) in 0.01s [localhost:21000] > select cast(cast(0.5999 as float) as decimal(5,1)); +-+ | cast(cast(0.5999 as float) as decimal(5,1)) | +-+ | 0.6 | +-+ Fetched 1 row(s) in 0.01s Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810 --- M be/src/exprs/decimal-operators-ir.cc M be/src/exprs/expr-test.cc M be/src/exprs/expr.h M be/src/exprs/literal.cc M be/src/runtime/decimal-test.cc M be/src/runtime/decimal-value.h M be/src/runtime/decimal-value.inline.h M be/src/udf/udf.h 8 files changed, 374 insertions(+), 82 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/5951/11 -- To view, visit http://gerrit.cloudera.org:8080/5951 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts
Zach Amsden has posted comments on this change. Change subject: IMPALA-2020: Add rounding for decimal casts .. Patch Set 10: (1 comment) http://gerrit.cloudera.org:8080/#/c/5951/10/be/src/runtime/decimal-value.inline.h File be/src/runtime/decimal-value.inline.h: PS10, Line 127: typename RESULT_T::underlying_type_t > It's applicable to the next line too, right ? The following looks easier to I agree that does look nicer, but I can't do it without polluting the client namespace with whatever typedef I use, unless I put it inside another namespace, like detail, which doesn't really solve the problem either. -- To view, visit http://gerrit.cloudera.org:8080/5951 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts
Zach Amsden has posted comments on this change. Change subject: IMPALA-2020: Add rounding for decimal casts .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/5951/5/be/src/exprs/literal.cc File be/src/exprs/literal.cc: Line 185: value_.decimal4_val = Decimal4Value::FromDouble(type, v, , true); > Shouldn't this still be dependent on the behavior we want to test ? We test I really don't want to complicate tests any more than necessary so since this is just for tests, I just gave this the sane expected behavior. If this happens to break some tests, we'll find out and fix them, but this should not change Impala behavior. -- To view, visit http://gerrit.cloudera.org:8080/5951 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts
Dan Hecht has posted comments on this change. Change subject: IMPALA-2020: Add rounding for decimal casts .. Patch Set 9: (5 comments) http://gerrit.cloudera.org:8080/#/c/5951/9/be/src/exprs/decimal-operators-ir.cc File be/src/exprs/decimal-operators-ir.cc: PS9, Line 303: round > Otherwise this is a behavior change. dv.ToInt detects overflow. to_type(d Yeah, I got that. What I'm saying is why not remove the 'round' paramter since we can't use it anyway? I originally suggested adding that parameter assuming it would eliminate the need for the else-clause, but since that's not the case, why keep the dead (and untested) code? http://gerrit.cloudera.org:8080/#/c/5951/9/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: Line 1503: { true, 0, 3, 0 }}}, > due to scaling, we get coverage anyway as 0.45 ends in 5. I can add it, bu okay. i missed that the scale=1 case covers that. http://gerrit.cloudera.org:8080/#/c/5951/9/be/src/runtime/decimal-value.inline.h File be/src/runtime/decimal-value.inline.h: PS9, Line 43: precise > Well, there aren't enough bits in the floating point multiplier to fully re sure, fixing in a separate change sounds fine. If there's no jira for that, can you open one and note the number in the todo? PS9, Line 119: DCHECK > Couldn't do this. DCHECK_EQ doesn't know how to print int128_t and larger ah, okay. http://gerrit.cloudera.org:8080/#/c/5951/10/be/src/runtime/decimal-value.inline.h File be/src/runtime/decimal-value.inline.h: Line 113: if (divisor == 1) { what's this for? is it an optimization for scale==0 case? -- To view, visit http://gerrit.cloudera.org:8080/5951 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts
Dan Hecht has posted comments on this change. Change subject: IMPALA-2020: Add rounding for decimal casts .. Patch Set 5: (6 comments) http://gerrit.cloudera.org:8080/#/c/5951/5/be/src/exprs/decimal-operators-ir.cc File be/src/exprs/decimal-operators-ir.cc: PS5, Line 303: return to_type(dv.whole_part(scale)); any reason not to have ToInt<>() handle this case too by passing in 'round' (similar to FromDouble())? http://gerrit.cloudera.org:8080/#/c/5951/5/be/src/exprs/literal.cc File be/src/exprs/literal.cc: Line 185: value_.decimal4_val = Decimal4Value::FromDouble(type, v, , true); where is this used? will it break compatibility? http://gerrit.cloudera.org:8080/#/c/5951/5/be/src/runtime/decimal-value.h File be/src/runtime/decimal-value.h: PS5, Line 54: bool* overflow, : bool round here and below, let's keep input parameters first, then output params. PS5, Line 232: bool& overflow use 'bool* overflow'. we generally use pointers for output params to make it clearer at the callsite. Line 232: inline typename RESULT_T::underlying_type_t ToInt(int scale, bool& overflow) const; add a function comment. http://gerrit.cloudera.org:8080/#/c/5951/5/be/src/runtime/decimal-value.inline.h File be/src/runtime/decimal-value.inline.h: Line 50: if (abs(d) >= max_value) { is this correct? 'd' is already scaled here. please be sure to test the upper bounds. -- To view, visit http://gerrit.cloudera.org:8080/5951 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts
Zach Amsden has uploaded a new patch set (#5). Change subject: IMPALA-2020: Add rounding for decimal casts .. IMPALA-2020: Add rounding for decimal casts This change adds support for DECIMAL_V2 rounding behavior for both DECIMAL to INT and DOUBLE to DECIMAL casts. Still writing tests, but this is ready for human eyes to look at. Testing: In progress. Rouding to int works as expected, rounding from float seems to work as well. Expect a full test suite for both modes and all edge cases to be covered. Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810 --- M be/src/exprs/decimal-operators-ir.cc M be/src/exprs/expr.h M be/src/exprs/literal.cc M be/src/runtime/decimal-test.cc M be/src/runtime/decimal-value.h M be/src/runtime/decimal-value.inline.h M be/src/udf/udf.h 7 files changed, 149 insertions(+), 71 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/5951/5 -- To view, visit http://gerrit.cloudera.org:8080/5951 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Zach Amsden