[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

2017-02-21 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change.

Change subject: IMPALA-2020: Add rounding for decimal casts
..


Patch Set 16:

(1 comment)

After fighting the compiler for over an hour to figure out why our operator<< 
was not being called, I found this little gem in glog/logging.h and since 
abandoned getting DCHECK_EQ / GT to work.

/ Build the error message string.
template
std::string* MakeCheckOpString(const t1& v1, const t2& v2, const char* names) {
  // It means that we cannot use stl_logging if compiler doesn't
  // support using expression for operator.
  // TODO(hamaji): Figure out a way to fix.
#if 1
  using ::operator<<;
#endif
  std::strstream ss;
  ss << names << " (" << v1 << " vs. " << v2 << ")";
  return new std::string(ss.str(), ss.pcount());
}

http://gerrit.cloudera.org:8080/#/c/5951/16//COMMIT_MSG
Commit Message:

Line 64: Query: select sum(cast(l_extendedprice as bigint)) from
> please pare down the commit msg, in particular get rid of the verbatim shel
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

2017-02-21 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-2020: Add rounding for decimal casts
..


Patch Set 16:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5951/16//COMMIT_MSG
Commit Message:

Line 64: Query: select sum(cast(l_extendedprice as bigint)) from
please pare down the commit msg, in particular get rid of the verbatim shell 
output (a table with the numbers is sufficient to get the point across).


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

2017-02-21 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change.

Change subject: IMPALA-2020: Add rounding for decimal casts
..


Patch Set 10:

(1 comment)

I am trying to get the DCHECK_EQ to work with int128_t, for which we do provide 
a definition, but gcc keeps complaining about an ambiguous overload.

http://gerrit.cloudera.org:8080/#/c/5951/10/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

PS10, Line 38:   if (round) {
 : // Decimal V2 behavior
 : 
 : // Multiply the double by the scale.
 : //
 : // TODO: this could be made more precise by doing the 
computation in
 : // 64 bit with 128 bit immediate result instead of doing an 
additional
 : // multiply in 53-bit double precision floating point
 : 
 : d *= DecimalUtil::GetScaleMultiplier(scale);
 : d = std::round(d);
 : const T max_value = 
DecimalUtil::GetScaleMultiplier(precision);
 : if (abs(d) >= max_value) {
 :   *overflow = true;
 :   return DecimalValue();
 : }
 : 
 : // Return the rounded integer part.
 : return DecimalValue(static_cast(d));
 :   } else {
 : // TODO: IMPALA-4924: remove DECIMAL V1 code
 : 
 : // Check overflow.
 : const T max_value = 
DecimalUtil::GetScaleMultiplier(precision - scale);
 : if (abs(d) >= max_value) {
 :   *overflow = true;
 :   return DecimalValue();
 : }
 : 
 : // Multiply the double by the scale.
 : d *= DecimalUtil::GetScaleMultiplier(scale);
 : 
 : // Truncate and just take the integer part.
 : return DecimalValue(static_cast(d));
 :   }
> If it won't break DECIMAL_V1, may as well merge the two paths and add the t
It looks like it will work and it doesn't change the speed.


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

2017-02-21 Thread Zach Amsden (Code Review)
Zach Amsden has uploaded a new patch set (#16).

Change subject: IMPALA-2020: Add rounding for decimal casts
..

IMPALA-2020: Add rounding for decimal casts

This change adds support for DECIMAL_V2 rounding behavior for both
DECIMAL to INT and DOUBLE to DECIMAL casts.  The round behavior
implemented for exact halves is round halves away from zero (e.g
(0.5 -> 1) and (-0.5 -> -1)).

Testing: Added expr-test and decimal-test test coverage as well as
manual testing.  I tried to update the expr benchmark to get some
kind of results but the benchmark is pretty bit-rotted.  It was
throwing JNI exceptions.  Fixed up the JNI init call, but there is
still a lot of work to do to get this back in a runnable state.
Even with the hack to get at the RuntimeContext, we end up getting
null derefs due to the slot descriptor table not being initialized.

I have decided to wait on expanding the python test until the bugs
with overflow are fixed, which will make it easier to test sane
behavior.

[localhost:21000] > select cast(0.5 AS int);
+--+
| cast(0.5 as int) |
+--+
| 0|
+--+
Fetched 1 row(s) in 0.01s
[localhost:21000] > select cast(cast(0.5999 as float) as decimal(5,1));
+-+
| cast(cast(0.5999 as float) as decimal(5,1)) |
+-+
| 0.5 |
+-+
Fetched 1 row(s) in 0.01s
[localhost:21000] > set decimal_v2=1;
DECIMAL_V2 set to 1
[localhost:21000] > select cast(0.5 AS int);
+--+
| cast(0.5 as int) |
+--+
| 1|
+--+
Fetched 1 row(s) in 0.01s
[localhost:21000] > select cast(cast(0.5999 as float) as decimal(5,1));
+-+
| cast(cast(0.5999 as float) as decimal(5,1)) |
+-+
| 0.6 |
+-+
Fetched 1 row(s) in 0.01s

Performance summary.  In all cases I have tried multiple times and
taken the fastest query results.

Old version, head at 815c76f9cbbe6585ebed961da506fc54ce2ef4e3:

[localhost:21000] > select sum(cast(l_extendedprice as bigint)) from
tpch10_parquet.lineitem;
Query: select sum(cast(l_extendedprice as bigint)) from
tpch10_parquet.lineitem
Query submitted at: 2017-02-21 19:31:32 (Coordinator:
http://impala-dev:25000)
Query progress can be monitored at:
http://impala-dev:25000/query_plan?query_id=bf4d06517e3093ca:b053953b
+--+
| sum(cast(l_extendedprice as bigint)) |
+--+
| 2293784575265|
+--+
Fetched 1 row(s) in 0.53s

With this change, and decimal_v2 off:

[localhost:21000] > select sum(cast(l_extendedprice as bigint)) from
tpch10_parquet.lineitem;
Query: select sum(cast(l_extendedprice as bigint)) from
tpch10_parquet.lineitem
Query submitted at: 2017-02-21 19:06:05 (Coordinator:
http://impala-dev:25000)
Query progress can be monitored at:
http://impala-dev:25000/query_plan?query_id=e54f6fc2c21d63d0:da7b6baa
+--+
| sum(cast(l_extendedprice as bigint)) |
+--+
| 2293784575265|
+--+
Fetched 1 row(s) in 0.52s

Note that there is some noise / instability in these results and across
invocations there is quite a bit of variance.  Still we appear not to
have regressed.

With decimal V2 enabled, we loose some performance due to rounding.

[localhost:21000] > set decimal_v2=1;
DECIMAL_V2 set to 1
[localhost:21000] >  select sum(cast(l_extendedprice as bigint)) from
tpch10_parquet.lineitem;
Query: select sum(cast(l_extendedprice as bigint)) from
tpch10_parquet.lineitem
Query submitted at: 2017-02-21 21:04:45 (Coordinator:
http://impala-dev:25000)
Query progress can be monitored at:
http://impala-dev:25000/query_plan?query_id=444a15a709f0672:74197af
+--+
| sum(cast(l_extendedprice as bigint)) |
+--+
| 2293814088985|
+--+
Fetched 1 row(s) in 0.63s

So we're about 20% slower.  The variance is quite a lot so this is not a
scientific number, but the trend is maintained.  So we have some work to
do to get this back.

Casting from double seems to be roughly at parity:

Old version, head at 815c76f9cbbe6585ebed961da506fc54ce2ef4e3:

[localhost:21000] >  select sum(cast(cast(l_extendedprice as double) as
decimal(14,2))) from tpch10_parquet.lineitem;
Query: select sum(cast(cast(l_extendedprice as double) as
decimal(14,2))) from tpch10_parquet.lineitem
Query 

[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

2017-02-21 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change.

Change subject: IMPALA-2020: Add rounding for decimal casts
..


Patch Set 12:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/5951/15//COMMIT_MSG
Commit Message:

Line 122
> Sure, we can try to get this back in follow on commits.
Will check


http://gerrit.cloudera.org:8080/#/c/5951/15/be/src/benchmarks/expr-benchmark.cc
File be/src/benchmarks/expr-benchmark.cc:

PS15, Line 218:   BENCHMARK("strncmp1", "'abcdefghi
> Can you please add a sample output like other benchmark functions ?
I can't - with these changes, this is in a slightly better state, but the 
benchmark still doesn't run, it segfaults almost immediately due to a NULL 
descriptor table.  After spending the better part of a day attempting to fix 
it, I've had to shelve this for now.  Considering that there are serious issues 
running this through codegen anyway, I'm not sure what to do with these changes.


http://gerrit.cloudera.org:8080/#/c/5951/15/be/src/runtime/decimal-test.cc
File be/src/runtime/decimal-test.cc:

PS15, Line 169: -.1
> -0.1.
Done


PS15, Line 224: Decimal4Value::FromDouble(t4, 0.4, true, );
> Isn't a duplicate of the test above ?
This was supposed to be negative :)


http://gerrit.cloudera.org:8080/#/c/5951/15/be/src/runtime/decimal-value.h
File be/src/runtime/decimal-value.h:

PS15, Line 52: truncating digits that
 :   /// cannot be represented or rounding if desired
> rounding to the closest integer if 'round' is true. Truncate the decimal pl
Done


http://gerrit.cloudera.org:8080/#/c/5951/10/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

PS10, Line 38:   if (round) {
 : // Decimal V2 behavior
 : 
 : // Multiply the double by the scale.
 : //
 : // Some quick research shows us that 10**38 in binary has 
exactly 53
 : // significant nonzero bits.  Coincidentally, double 
precision floating
 : // point gives us 53 bits for the mantissa.  TODO: validate 
that the
 : // values produced by the template are both correct, 
constant and that
 : // the multiply does not lose precision.
 : 
 : d *= DecimalUtil::GetScaleMultiplier(scale);
 : d = std::round(d);
 : const T max_value = 
DecimalUtil::GetScaleMultiplier(precision);
 : if (max_value < 0 || UNLIKELY(std::isnan(d)) || 
UNLIKELY(std::abs(d) >= max_value)) {
 :   *overflow = true;
 :   return DecimalValue();
 : }
 : 
 : // Return the rounded integer part.
 : return DecimalValue(static_cast(d));
 :   } else {
 : // TODO: IMPALA-4924: remove DECIMAL V1 code
 : 
 : // Check overflow.
 : const T max_value = 
DecimalUtil::GetScaleMultiplier(precision - scale);
 : if (abs(d) >= max_value) {
 :   *overflow = true;
 :   return DecimalValue();
 : }
 : 
 : // Multiply the double by the scale.
 : d *= DecimalUtil::GetScaleMultiplier(scale);
 : 
 :
> Are there particular cases you have in mind which may break if we use V2's 
None I can come up with right now, but this was before I noticed the bit 
pattern for 10^37 fits exactly in a a double.  Now I'm reasonably sure it is 
safe.


http://gerrit.cloudera.org:8080/#/c/5951/12/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

PS12, Line 52: max_value < 0
> Yes, if it's not possible (i.e. only happens due to a software bug somewher
Done


http://gerrit.cloudera.org:8080/#/c/5951/15/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

PS15, Line 132: 
> nit: long line
fixed


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

2017-02-21 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-2020: Add rounding for decimal casts
..


Patch Set 12:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5951/12/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

PS12, Line 52: max_value < 0
> If we cast to a decimal type whose underlying width is not large enough to 
Yes, if it's not possible (i.e. only happens due to a software bug somewhere 
else), let's make it a DCHECK.


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

2017-02-21 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-2020: Add rounding for decimal casts
..


Patch Set 15:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5951/15//COMMIT_MSG
Commit Message:

Line 122: do to get this back.
Sure, we can try to get this back in follow on commits.

But the other thing I wanted to check was whether perf for v2=0 regresses at 
all with this change.


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 15
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

2017-02-20 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-2020: Add rounding for decimal casts
..


Patch Set 15:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/5951/15/be/src/benchmarks/expr-benchmark.cc
File be/src/benchmarks/expr-benchmark.cc:

PS15, Line 218: Benchmark* BenchmarkDecimalCast() {
Can you please add a sample output like other benchmark functions ?


http://gerrit.cloudera.org:8080/#/c/5951/15/be/src/runtime/decimal-test.cc
File be/src/runtime/decimal-test.cc:

PS15, Line 169: -.1
-0.1.

Also may help to add cases which cause overflow due to rounding.

Decimal16Value::FromDouble(t3, 0.9, true, );
Decimal16Value::FromDouble(t3, 0.9, false, );


PS15, Line 224: Decimal4Value::FromDouble(t4, 0.4, true, );
Isn't a duplicate of the test above ?


http://gerrit.cloudera.org:8080/#/c/5951/15/be/src/runtime/decimal-value.h
File be/src/runtime/decimal-value.h:

PS15, Line 52: truncating digits that
 :   /// cannot be represented or rounding if desired
rounding to the closest integer if 'round' is true. Truncate the decimal places 
otherwise.


http://gerrit.cloudera.org:8080/#/c/5951/10/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

PS10, Line 38:   if (round) {
 : // Decimal V2 behavior
 : 
 : // Multiply the double by the scale.
 : //
 : // Some quick research shows us that 10**38 in binary has 
exactly 53
 : // significant nonzero bits.  Coincidentally, double 
precision floating
 : // point gives us 53 bits for the mantissa.  TODO: validate 
that the
 : // values produced by the template are both correct, 
constant and that
 : // the multiply does not lose precision.
 : 
 : d *= DecimalUtil::GetScaleMultiplier(scale);
 : d = std::round(d);
 : const T max_value = 
DecimalUtil::GetScaleMultiplier(precision);
 : DCHECK(max_value > 0);  // no DCHECK_GT because of int128_t
 : if (UNLIKELY(std::isnan(d)) || UNLIKELY(std::abs(d) >= 
max_value)) {
 :   *overflow = true;
 :   return DecimalValue();
 : }
 : 
 : // Return the rounded integer part.
 : return DecimalValue(static_cast(d));
 :   } else {
 : // TODO: IMPALA-4924: remove DECIMAL V1 code
 : 
 : // Check overflow.
 : const T max_value = 
DecimalUtil::GetScaleMultiplier(precision - scale);
 : if (abs(d) >= max_value) {
 :   *overflow = true;
 :   return DecimalValue();
 : }
 : 
 : // Multiply the double by the scale.
 : d *= DecimalUtil::GetScaleMultiplier(scale);
 : 
> Sort of.  I didn't want to change the original at all in this change though
Are there particular cases you have in mind which may break if we use V2's 
logic for both version ?


http://gerrit.cloudera.org:8080/#/c/5951/15/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

PS15, Line 132: ||
nit: long line


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 15
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

2017-02-17 Thread Zach Amsden (Code Review)
Zach Amsden has uploaded a new patch set (#15).

Change subject: IMPALA-2020: Add rounding for decimal casts
..

IMPALA-2020: Add rounding for decimal casts

This change adds support for DECIMAL_V2 rounding behavior for both
DECIMAL to INT and DOUBLE to DECIMAL casts.  The round behavior
implemented for exact halves is round halves away from zero (e.g
(0.5 -> 1) and (-0.5 -> -1)).

Testing: Added expr-test and decimal-test test coverage as well as
manual testing.  I tried to update the expr benchmark to get some
kind of results but the benchmark is pretty bit-rotted.  It was
throwing JNI exceptions.  Fixed up the JNI init call, but there is
still a lot of work to do to get this back in a runnable state.
Even with the hack to get at the RuntimeContext, we end up getting
null derefs due to the slot descriptor table not being initialized.

I have decided to wait on expanding the python test until the bugs
with overflow are fixed, which will make it easier to test sane
behavior.

[localhost:21000] > select cast(0.5 AS int);
+--+
| cast(0.5 as int) |
+--+
| 0|
+--+
Fetched 1 row(s) in 0.01s
[localhost:21000] > select cast(cast(0.5999 as float) as decimal(5,1));
+-+
| cast(cast(0.5999 as float) as decimal(5,1)) |
+-+
| 0.5 |
+-+
Fetched 1 row(s) in 0.01s
[localhost:21000] > set decimal_v2=1;
DECIMAL_V2 set to 1
[localhost:21000] > select cast(0.5 AS int);
+--+
| cast(0.5 as int) |
+--+
| 1|
+--+
Fetched 1 row(s) in 0.01s
[localhost:21000] > select cast(cast(0.5999 as float) as decimal(5,1));
+-+
| cast(cast(0.5999 as float) as decimal(5,1)) |
+-+
| 0.6 |
+-+
Fetched 1 row(s) in 0.01s

Note there is no free lunch:

[localhost:21000] > set decimal_v2=0;
DECIMAL_V2 set to 0
[localhost:21000] > select sum(cast(l_extendedprice as bigint)) from
tpch10_parquet.lineitem;
Query: select sum(cast(l_extendedprice as bigint)) from
tpch10_parquet.lineitem
Query submitted at: 2017-02-18 01:16:49 (Coordinator:
http://impala-dev:25000)
Query progress can be monitored at:
http://impala-dev:25000/query_plan?query_id=c546f270316b2176:820bb667
+--+
| sum(cast(l_extendedprice as bigint)) |
+--+
| 2293784575265|
+--+
Fetched 1 row(s) in 0.76s
[localhost:21000] > select sum(cast(l_extendedprice as bigint)) from
tpch10_parquet.lineitem;
Query: select sum(cast(l_extendedprice as bigint)) from
tpch10_parquet.lineitem
Query submitted at: 2017-02-18 01:16:52 (Coordinator:
http://impala-dev:25000)
Query progress can be monitored at:
http://impala-dev:25000/query_plan?query_id=524bf2693849ce99:be999b03
+--+
| sum(cast(l_extendedprice as bigint)) |
+--+
| 2293784575265|
+--+
Fetched 1 row(s) in 0.73s
[localhost:21000] > set decimal_v2=1;
DECIMAL_V2 set to 1
[localhost:21000] > select sum(cast(l_extendedprice as bigint)) from
tpch10_parquet.lineitem;
Query: select sum(cast(l_extendedprice as bigint)) from
tpch10_parquet.lineitem
Query submitted at: 2017-02-18 01:16:59 (Coordinator:
http://impala-dev:25000)
Query progress can be monitored at:
http://impala-dev:25000/query_plan?query_id=ca4f8413061576d9:2389ccde
+--+
| sum(cast(l_extendedprice as bigint)) |
+--+
| 2293814088985|
+--+
Fetched 1 row(s) in 0.85s
[localhost:21000] > select sum(cast(l_extendedprice as bigint)) from
tpch10_parquet.lineitem;
Query: select sum(cast(l_extendedprice as bigint)) from
tpch10_parquet.lineitem
Query submitted at: 2017-02-18 01:17:02 (Coordinator:
http://impala-dev:25000)
Query progress can be monitored at:
http://impala-dev:25000/query_plan?query_id=4d4b5f9f181306c7:f6726c6
+--+
| sum(cast(l_extendedprice as bigint)) |
+--+
| 2293814088985|
+--+
Fetched 1 row(s) in 0.96s

So we're about 20% slower.  The variance is quite a lot so this is not a
scientific number, but the trend is maintained.  So we have some work to
do to get this back.

Casting from double seems to be roughly at parity:

[localhost:21000] > set decimal_v2=0;

[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

2017-02-17 Thread Zach Amsden (Code Review)
Zach Amsden has uploaded a new patch set (#13).

Change subject: IMPALA-2020: Add rounding for decimal casts
..

IMPALA-2020: Add rounding for decimal casts

This change adds support for DECIMAL_V2 rounding behavior for both
DECIMAL to INT and DOUBLE to DECIMAL casts.  The round behavior
implemented for exact halves is round halves away from zero (e.g
(0.5 -> 1) and (-0.5 -> -1)).

Testing: Added expr-test and decimal-test test coverage as well as
manual testing.  I tried to update the expr benchmark to get some
kind of results but the benchmark is pretty bit-rotted.  It was
throwing JNI exceptions.  Fixed up the JNI init call, but there is
still a lot of work to do to get this back in a runnable state.
Even with the hack to get at the RuntimeContext, we end up getting
null derefs due to the slot descriptor table not being initialized.

[localhost:21000] > select cast(0.5 AS int);
+--+
| cast(0.5 as int) |
+--+
| 0|
+--+
Fetched 1 row(s) in 0.01s
[localhost:21000] > select cast(cast(0.5999 as float) as decimal(5,1));
+-+
| cast(cast(0.5999 as float) as decimal(5,1)) |
+-+
| 0.5 |
+-+
Fetched 1 row(s) in 0.01s
[localhost:21000] > set decimal_v2=1;
DECIMAL_V2 set to 1
[localhost:21000] > select cast(0.5 AS int);
+--+
| cast(0.5 as int) |
+--+
| 1|
+--+
Fetched 1 row(s) in 0.01s
[localhost:21000] > select cast(cast(0.5999 as float) as decimal(5,1));
+-+
| cast(cast(0.5999 as float) as decimal(5,1)) |
+-+
| 0.6 |
+-+
Fetched 1 row(s) in 0.01s

Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/expr.h
M be/src/exprs/literal.cc
M be/src/runtime/decimal-test.cc
M be/src/runtime/decimal-value.h
M be/src/runtime/decimal-value.inline.h
M be/src/udf/udf.h
9 files changed, 467 insertions(+), 111 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/5951/13
-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

2017-02-17 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change.

Change subject: IMPALA-2020: Add rounding for decimal casts
..


Patch Set 12:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5951/12/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

PS12, Line 49: double
> shouldn't this be T?
No, T may not have enough precision to hold the value of 10^scale - look at the 
old code, which also does the multiplication in double.


PS12, Line 52: max_value < 0
> why is the max_value < 0 check needed?
If we cast to a decimal type whose underlying width is not large enough to hold 
the precision, this will become true.

Make it a DCHECK?  This should be impossible, guaranteed by the type system.


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

2017-02-16 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-2020: Add rounding for decimal casts
..


Patch Set 11:

(5 comments)

> I have not updated the Python test yet, as that would require a
 > rebase.  Python test and benchmark will come in next patch set,
 > this is just to give a clean slate for review.

Thanks.  Just some minor comments on the code. 

I think the benchmark is higher priority than the python test since we already 
have pretty good functional test coverage and so less likely to be surprised, 
whereas there could be a perf regression.

We could even do the python test as a follow on commit if that is a fair amount 
of work (perhaps even after taking care of divide rounding).

http://gerrit.cloudera.org:8080/#/c/5951/11/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

PS11, Line 1526: power(10, 3) - power(10, -1)
at some point in the future, we may implement a decimal variant of POWER(), in 
which case these test cases would no longer exercise the CAST(double as 
decimal) path.  So, how about either adding a check:

// The CAST FLOAT -> DECIMAL test cases rely on power() returning
// a double. If that changes, those test cases need to be updated.
TestStringValue("typeOf(power(10,1))", "DOUBLE");

Or, we could add the explicit cast to double/float now.


Line 1601: {{ false, -999, 3, 0 }}},
there all do DOUBLE -> DECIMAL.  How about adding some FLOAT -> DECIMAL cases 
as well?


http://gerrit.cloudera.org:8080/#/c/5951/11/be/src/runtime/decimal-value.h
File be/src/runtime/decimal-value.h:

PS11, Line 232: Optionally
delete "Optionally" since it's not optional.


PS11, Line 233: otherwise the result is truncated.
delete since this doesn't do truncation


http://gerrit.cloudera.org:8080/#/c/5951/11/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

PS11, Line 45: mantissaa
mantissa


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

2017-02-15 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change.

Change subject: IMPALA-2020: Add rounding for decimal casts
..


Patch Set 11:

I have not updated the Python test yet, as that would require a rebase.  Python 
test and benchmark will come in next patch set, this is just to give a clean 
slate for review.

-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

2017-02-15 Thread Zach Amsden (Code Review)
Zach Amsden has uploaded a new patch set (#11).

Change subject: IMPALA-2020: Add rounding for decimal casts
..

IMPALA-2020: Add rounding for decimal casts

This change adds support for DECIMAL_V2 rounding behavior for both
DECIMAL to INT and DOUBLE to DECIMAL casts.  The round behavior
implemented for exact halves is round halves away from zero (e.g
(0.5 -> 1) and (-0.5 -> -1)).

Testing: Added expr-test and decimal-test test coverage as well as
manual testing.

[localhost:21000] > select cast(0.5 AS int);
+--+
| cast(0.5 as int) |
+--+
| 0|
+--+
Fetched 1 row(s) in 0.01s
[localhost:21000] > select cast(cast(0.5999 as float) as decimal(5,1));
+-+
| cast(cast(0.5999 as float) as decimal(5,1)) |
+-+
| 0.5 |
+-+
Fetched 1 row(s) in 0.01s
[localhost:21000] > set decimal_v2=1;
DECIMAL_V2 set to 1
[localhost:21000] > select cast(0.5 AS int);
+--+
| cast(0.5 as int) |
+--+
| 1|
+--+
Fetched 1 row(s) in 0.01s
[localhost:21000] > select cast(cast(0.5999 as float) as decimal(5,1));
+-+
| cast(cast(0.5999 as float) as decimal(5,1)) |
+-+
| 0.6 |
+-+
Fetched 1 row(s) in 0.01s

Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
---
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/expr.h
M be/src/exprs/literal.cc
M be/src/runtime/decimal-test.cc
M be/src/runtime/decimal-value.h
M be/src/runtime/decimal-value.inline.h
M be/src/udf/udf.h
8 files changed, 374 insertions(+), 82 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/5951/11
-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

2017-02-15 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change.

Change subject: IMPALA-2020: Add rounding for decimal casts
..


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5951/10/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

PS10, Line 127: typename RESULT_T::underlying_type_t
> It's applicable to the next line too, right ? The following looks easier to
I agree that does look nicer, but I can't do it without polluting the client 
namespace with whatever typedef I use, unless I put it inside another 
namespace, like detail, which doesn't really solve the problem either.


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

2017-02-15 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change.

Change subject: IMPALA-2020: Add rounding for decimal casts
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5951/5/be/src/exprs/literal.cc
File be/src/exprs/literal.cc:

Line 185: value_.decimal4_val = Decimal4Value::FromDouble(type, v, 
, true);
> Shouldn't this still be dependent on the behavior we want to test ? We test
I really don't want to complicate tests any more than necessary so since this 
is just for tests, I just gave this the sane expected behavior.  If this 
happens to break some tests, we'll find out and fix them, but this should not 
change Impala behavior.


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

2017-02-14 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-2020: Add rounding for decimal casts
..


Patch Set 9:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/5951/9/be/src/exprs/decimal-operators-ir.cc
File be/src/exprs/decimal-operators-ir.cc:

PS9, Line 303: round
> Otherwise this is a behavior change.  dv.ToInt detects overflow.  to_type(d
Yeah, I got that.  What I'm saying is why not remove the 'round' paramter since 
we can't use it anyway?  I originally suggested adding that parameter assuming 
it would eliminate the need for the else-clause, but since that's not the case, 
why keep the  dead (and untested) code?


http://gerrit.cloudera.org:8080/#/c/5951/9/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

Line 1503:  { true, 0, 3, 0 }}},
> due to scaling, we get coverage anyway as 0.45 ends in 5.  I can add it, bu
okay.  i missed that the scale=1 case covers that.


http://gerrit.cloudera.org:8080/#/c/5951/9/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

PS9, Line 43: precise
> Well, there aren't enough bits in the floating point multiplier to fully re
sure, fixing in a separate change sounds fine.  If there's no jira for that, 
can you open one and note the number in the todo?


PS9, Line 119: DCHECK
> Couldn't do this.  DCHECK_EQ doesn't know how to print int128_t and larger 
ah, okay.


http://gerrit.cloudera.org:8080/#/c/5951/10/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

Line 113:   if (divisor == 1) {
what's this for? is it an optimization for scale==0 case?


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

2017-02-13 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-2020: Add rounding for decimal casts
..


Patch Set 5:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/5951/5/be/src/exprs/decimal-operators-ir.cc
File be/src/exprs/decimal-operators-ir.cc:

PS5, Line 303: return to_type(dv.whole_part(scale)); 
any reason not to have ToInt<>() handle this case too by passing in 'round' 
(similar to FromDouble())?


http://gerrit.cloudera.org:8080/#/c/5951/5/be/src/exprs/literal.cc
File be/src/exprs/literal.cc:

Line 185: value_.decimal4_val = Decimal4Value::FromDouble(type, v, 
, true);
where is this used? will it break compatibility?


http://gerrit.cloudera.org:8080/#/c/5951/5/be/src/runtime/decimal-value.h
File be/src/runtime/decimal-value.h:

PS5, Line 54: bool* overflow,
:   bool round
here and below, let's keep input parameters first, then output params.


PS5, Line 232:  bool& overflow
use 'bool* overflow'. we generally use pointers for output params to make it 
clearer at the callsite.


Line 232:   inline typename RESULT_T::underlying_type_t ToInt(int scale, bool& 
overflow) const;
add a function comment.


http://gerrit.cloudera.org:8080/#/c/5951/5/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

Line 50: if (abs(d) >= max_value) {
is this correct? 'd' is already scaled here.  please be sure to test the upper 
bounds.


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

2017-02-10 Thread Zach Amsden (Code Review)
Zach Amsden has uploaded a new patch set (#5).

Change subject: IMPALA-2020: Add rounding for decimal casts
..

IMPALA-2020: Add rounding for decimal casts

This change adds support for DECIMAL_V2 rounding behavior for both
DECIMAL to INT and DOUBLE to DECIMAL casts.  Still writing tests,
but this is ready for human eyes to look at.

Testing: In progress.  Rouding to int works as expected, rounding from
float seems to work as well.
Expect a full test suite for both modes and all edge cases to be covered.

Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
---
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr.h
M be/src/exprs/literal.cc
M be/src/runtime/decimal-test.cc
M be/src/runtime/decimal-value.h
M be/src/runtime/decimal-value.inline.h
M be/src/udf/udf.h
7 files changed, 149 insertions(+), 71 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/5951/5
-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Zach Amsden