[Impala-ASF-CR] IMPALA-3436: Return a decimal when rounding a double

2017-11-14 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/8398 )

Change subject: IMPALA-3436: Return a decimal when rounding a double
..

IMPALA-3436: Return a decimal when rounding a double

Before this patch, the function round(double, int) returned a double,
which is inconsistent with other round() functions.

In this patch, we change the return type of round(double, int) to be
decimal only if decimal_v2 is enabled. If decimal_v2 is not enabled,
the function returns a double, as before.

This is implemented by introducing a translateion where we make the
parser aware of query options and rename the function to round_v1()
during parsing if decimal_v2 is not enabled. To make this translation
invisible, we also rename it back to round() in toSql().

This translation should be removed when we remove decimal v1.

Testing:
- Added EE tests.

Change-Id: I240ec70da7996bbfefcf6438eba4dff8d33d7bd6
---
M be/src/exprs/decimal-functions-ir.cc
M be/src/exprs/decimal-functions.h
M be/src/exprs/decimal-operators.h
M common/function-registry/impala_functions.py
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/analysis/FunctionName.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
M testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test
10 files changed, 268 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/8398/4
--
To view, visit http://gerrit.cloudera.org:8080/8398
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I240ec70da7996bbfefcf6438eba4dff8d33d7bd6
Gerrit-Change-Number: 8398
Gerrit-PatchSet: 4
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-3436: Return a decimal when rounding a double

2017-11-14 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8398 )

Change subject: IMPALA-3436: Return a decimal when rounding a double
..


Patch Set 3:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/8398/3/be/src/exprs/decimal-functions-ir.cc
File be/src/exprs/decimal-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/8398/3/be/src/exprs/decimal-functions-ir.cc@119
PS3, Line 119: Decimal16Value
> Do we always need Decimal16? Or should we switch on ByteSize() like the oth
Yes, the result is a decimal with precision 38, so it always requires 16 bytes.


http://gerrit.cloudera.org:8080/#/c/8398/3/common/function-registry/impala_functions.py
File common/function-registry/impala_functions.py:

http://gerrit.cloudera.org:8080/#/c/8398/3/common/function-registry/impala_functions.py@285
PS3, Line 285:   [['round_v1','dround_v1'],
> I wonder if we should keep the functionality around under another alias in
Ok, are you suggesting to leave the code as is right now and add "fround" when 
we get rid of decimal_v1?

Also, added comment.


http://gerrit.cloudera.org:8080/#/c/8398/3/common/function-registry/impala_functions.py@289
PS3, Line 289:   [['round','dround','round_v1','dround_v1'],
> This function is used both decimal_v1/v2 modes.
Done


http://gerrit.cloudera.org:8080/#/c/8398/1/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java:

http://gerrit.cloudera.org:8080/#/c/8398/1/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java@407
PS1, Line 407: Type == nu
> situation
This comment was modified in patch 2.


http://gerrit.cloudera.org:8080/#/c/8398/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java:

http://gerrit.cloudera.org:8080/#/c/8398/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java@2439
PS3, Line 2439: AnalyzesOk("select round(cast(1.123 as double), int_col) 
from functional.alltypes",
> add a positive test with a non-NULL constant second argument to both v1/v2
Done



--
To view, visit http://gerrit.cloudera.org:8080/8398
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I240ec70da7996bbfefcf6438eba4dff8d33d7bd6
Gerrit-Change-Number: 8398
Gerrit-PatchSet: 3
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 15 Nov 2017 00:35:41 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6183: Fix Decimal to Double conversion

2017-11-14 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8547


Change subject: IMPALA-6183: Fix Decimal to Double conversion
..

IMPALA-6183: Fix Decimal to Double conversion

When converting a decimal to a double, we incorrectly used the powf()
function in the backend, which returns a float instead of a double.
This caused us to lose precision.

We fix the problem by replacing the powf() function with a pow()
function, which returns a double.

Testing:
- Added an EE test.

Change-Id: I9bf81d039e5037f22c64a32b328832235aafe9e3
---
M be/src/runtime/decimal-value.inline.h
M testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test
2 files changed, 27 insertions(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/8547/1
--
To view, visit http://gerrit.cloudera.org:8080/8547
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I9bf81d039e5037f22c64a32b328832235aafe9e3
Gerrit-Change-Number: 8547
Gerrit-PatchSet: 1
Gerrit-Owner: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-5017: Error on decimal overflow

2017-11-07 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/8404 )

Change subject: IMPALA-5017: Error on decimal overflow
..

IMPALA-5017: Error on decimal overflow

Before this patch, decimal operations would either silently overflow (in
the case of sum() and avg()), or produce a warning.

In this patch, the behaviour is changed so that an error is produced in
the case of overflow when DECIMAL_v2 is enabled. Decimal v1 behaviour is
unchanged.

We introduce overflow checks when computing sum() and avg(). This
results in a ~30% performance regression when we are in decimal v2 mode
compared to decimal v1.

Benchmarks:
  Query:
  select sum(dec_38_19) from decimal_tbl
Decimal v1: 13.09s
Decimal v2: 17.10s

  Query:
  select avg(dec_38_19) from decimal_tbl
Decimal v1: 13.60s
Decimal v2: 19.10s

  Query:
  select avg(dec_9_5) from decimal_tbl
Decimal v1: 12.06s
Decimal v2: 18.07s

Testing:
- Added several end to end tests.
- Updated Expr tests to check for error in case of overflow.

Change-Id: Id98a92c9a9469ec8cf14e518c741a2dab7053019
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-codegen-test.cc
M be/src/exprs/expr-test.cc
M testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test
M testdata/workloads/functional-query/queries/QueryTest/decimal.test
6 files changed, 214 insertions(+), 69 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/8404/2
--
To view, visit http://gerrit.cloudera.org:8080/8404
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id98a92c9a9469ec8cf14e518c741a2dab7053019
Gerrit-Change-Number: 8404
Gerrit-PatchSet: 2
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-5017: Error on decimal overflow

2017-11-07 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8404 )

Change subject: IMPALA-5017: Error on decimal overflow
..


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8404/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8404/1//COMMIT_MSG@12
PS1, Line 12: In this patch, the beharior is changed so that an error is 
produced in
> I'll also accept behaviour ;)
Done


http://gerrit.cloudera.org:8080/#/c/8404/1//COMMIT_MSG@27
PS1, Line 27:   select avg(dec_38_19) from decimal_tbl
> I guess these regressions occur regardless of the width of the input decima
Correct. Added another benchmark to illustrate this.


http://gerrit.cloudera.org:8080/#/c/8404/1/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/8404/1/be/src/exprs/aggregate-functions-ir.cc@413
PS1, Line 413:   abs(avg->sum_val16) > 
DecimalUtil::MAX_UNSCALED_DECIMAL16 - abs(src.val4))) {
> This isn't correct if sum_val16 and src.val* have opposite sign, is it?
That's true. Fixed and added a test case.


http://gerrit.cloudera.org:8080/#/c/8404/1/be/src/exprs/aggregate-functions-ir.cc@420
PS1, Line 420:   abs(avg->sum_val16) > 
DecimalUtil::MAX_UNSCALED_DECIMAL16 - abs(src.val8))) {
> What do you think about only checking for overflow of the integer type at t
1. Are you suggesting to check for something like abs(avg->sum_val16) > 2**128 
- abs(src.val8))) ?
I don't really see why it is faster to check against 2**128 instead of against 
MAX_UNSCALED_DECIMAL.

2. What do you mean by initial check? Where would this check be done? Something 
like (decimal_v2 && sum_val16 > 2**32 && abs(avg->sum_val16) > 
MAX_UNSCALED_DECIMAL - abs(src.val8)))?



--
To view, visit http://gerrit.cloudera.org:8080/8404
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id98a92c9a9469ec8cf14e518c741a2dab7053019
Gerrit-Change-Number: 8404
Gerrit-PatchSet: 1
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Tue, 07 Nov 2017 21:59:10 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5019: Decimal V2 addition

2017-11-06 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/8309 )

Change subject: IMPALA-5019: Decimal V2 addition
..

IMPALA-5019: Decimal V2 addition

In this patch, we implement the new decimal return type rules for
addition expressions. These rules become active when the query option
DECIMAL_V2 is enabled. The algorithm for determining the type of the
result is described in the JIRA.

DECIMAL V1:
++
| typeof(cast(1 as decimal(38,0)) + cast(0.1 as decimal(38,38))) |
++
| DECIMAL(38,38) |
++

DECIMAL V2:
++
| typeof(cast(1 as decimal(38,0)) + cast(0.1 as decimal(38,38))) |
++
| DECIMAL(38,6)  |
++

This patch required backend changes. We implement an algorithm where
we handle the whole and fractional parts separately, and then combine
them to get the final result. This is more complex and slower. We try
to avoid this by first checking if the result would fit into int128.

Testing:
- Added expr tests.
- Tested locally on my machine with a script that generates random
  decimal numbers and checks that Impala adds them correctly.

Performance:

For the common case, performance remains the same.
  select cast(2.2 as decimal(18, 1) + cast(2.2 as decimal(18, 1)

  BEFORE: 4.74s
  AFTER:  4.73s

In this case, we check if it is necessary to do the complex addition,
and it turns out to be not necessary. We see a slowdown because the
result needs to be scaled down by dividing.
  select cast(2.2 as decimal(38, 19) + cast(2.2 as decimal(38, 19)

  BEFORE: 1.63s
  AFTER:  13.57s

In following case, we take the most complex path and see the most
signification perfmance hit.
  select cast(7.5 as decimal(38,37)) + cast(2.2 as decimal(38,37))

  BEFORE: 1.63s
  AFTER: 20.57

Change-Id: I401049c56d910eb1546a178c909c923b01239336
---
M be/src/exprs/expr-test.cc
M be/src/runtime/decimal-value.h
M be/src/runtime/decimal-value.inline.h
M fe/src/main/java/org/apache/impala/analysis/TypesUtil.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
5 files changed, 392 insertions(+), 87 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/8309/4
--
To view, visit http://gerrit.cloudera.org:8080/8309
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I401049c56d910eb1546a178c909c923b01239336
Gerrit-Change-Number: 8309
Gerrit-PatchSet: 4
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-5019: Decimal V2 addition

2017-11-06 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8309 )

Change subject: IMPALA-5019: Decimal V2 addition
..


Patch Set 3:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/8309/3/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

http://gerrit.cloudera.org:8080/#/c/8309/3/be/src/runtime/decimal-value.inline.h@179
PS3, Line 179: int128_t& x_left, int128_t& x_right, int128_t& y_left, 
int128_t& y_right) {
> Use pointers for output arguments: https://google.github.io/styleguide/cppg
Done


http://gerrit.cloudera.org:8080/#/c/8309/3/be/src/runtime/decimal-value.inline.h@214
PS3, Line 214: right_overflow
> carry_to_left might be a better name? Since the intent is to carry this int
Agreed, changed.


http://gerrit.cloudera.org:8080/#/c/8309/3/be/src/runtime/decimal-value.inline.h@243
PS3, Line 243: inline int128_t SubtractLarge(int128_t x, int x_scale, int128_t 
y, int y_scale,
> What do you think about make it so that this always computes x - y, and mak
The way it is currently implemented, I don't see an advantage in doing that. We 
do not use the subtraction operation in this function, we still add x and y 
together.

Are you suggesting to rewrite it, so that x and y are both expected to be 
positive and we will be doing x - y here? It's not quite clear to me which case 
would be eliminated if we rewrite it that way. (Both cases on line 264 would 
still need to be present)


http://gerrit.cloudera.org:8080/#/c/8309/2/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

http://gerrit.cloudera.org:8080/#/c/8309/2/be/src/runtime/decimal-value.inline.h@305
PS2, Line 305: RESULT_T y = 0;
> Ah ok. I wonder if renaming the variable would make it clearer. Something l
Ranamed to result_scale_decrease


http://gerrit.cloudera.org:8080/#/c/8309/2/be/src/runtime/decimal-value.inline.h@310
PS2, Line 310:   }
> Thanks. I still think something is missing in the AdjustToSameScale() name
Modified the comment of AdjustToSameScale() slightly. I was  debating changing 
the name to AdjustToMaxScale(), but that might be confusing (because max scale 
is 38)).



--
To view, visit http://gerrit.cloudera.org:8080/8309
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I401049c56d910eb1546a178c909c923b01239336
Gerrit-Change-Number: 8309
Gerrit-PatchSet: 3
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Tue, 07 Nov 2017 01:43:37 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3436: Return a decimal when rounding a double

2017-11-06 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/8398 )

Change subject: IMPALA-3436: Return a decimal when rounding a double
..

IMPALA-3436: Return a decimal when rounding a double

Before this patch, the function round(double, int) returned a double,
which is inconsistent with other round() functions.

In this patch, we change the return type of round(double, int) to be
decimal only if decimal_v2 is enabled. If decimal_v2 is not enabled,
the function returns a double, as before.

This is implemented by introducing a translateion where we make the
parser aware of query options and rename the function to round_v1()
during parsing if decimal_v2 is not enabled. To make this translation
invisible, we also rename it back to round() in toSql().

This translation should be removed when we remove decimal v1.

Testing:
- Added EE tests.

Change-Id: I240ec70da7996bbfefcf6438eba4dff8d33d7bd6
---
M be/src/exprs/decimal-functions-ir.cc
M be/src/exprs/decimal-functions.h
M be/src/exprs/decimal-operators.h
M common/function-registry/impala_functions.py
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/analysis/FunctionName.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
M testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test
10 files changed, 264 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/8398/3
--
To view, visit http://gerrit.cloudera.org:8080/8398
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I240ec70da7996bbfefcf6438eba4dff8d33d7bd6
Gerrit-Change-Number: 8398
Gerrit-PatchSet: 3
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-3436: Return a decimal when rounding a double

2017-11-06 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/8398 )

Change subject: IMPALA-3436: Return a decimal when rounding a double
..

IMPALA-3436: Return a decimal when rounding a double

Before this patch, the function round(double, int) returned a double,
which is inconsistent with other round() functions.

In this patch, we change the return type of round(double, int) to be
decimal only if decimal_v2 is enabled. If decimal_v2 is not enabled,
the function returns a double, as before.

This is implemented by introducing a translateion where we make the
parser aware of query options and rename the function to round_v1()
during parsing if decimal_v2 is not enabled. To make this translation
invisible, we also rename it back to round() in toSql().

This translation should be removed when we remove decimal v1.

Testing:
- Added EE tests.

Change-Id: I240ec70da7996bbfefcf6438eba4dff8d33d7bd6
---
M be/src/exprs/decimal-functions-ir.cc
M be/src/exprs/decimal-functions.h
M be/src/exprs/decimal-operators.h
M common/function-registry/impala_functions.py
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/analysis/FunctionName.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
M testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test
10 files changed, 264 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/8398/2
--
To view, visit http://gerrit.cloudera.org:8080/8398
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I240ec70da7996bbfefcf6438eba4dff8d33d7bd6
Gerrit-Change-Number: 8398
Gerrit-PatchSet: 2
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-3436: Return a decimal when rounding a double

2017-11-06 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8398 )

Change subject: IMPALA-3436: Return a decimal when rounding a double
..


Patch Set 1:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/8398/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8398/1//COMMIT_MSG@16
PS1, Line 16: This is implemented by introducing a "hack" where we make the 
parser
> How about we say rewrite or translation instead of hack? I don't think it's
Done


http://gerrit.cloudera.org:8080/#/c/8398/1/be/src/exprs/decimal-functions-ir.cc
File be/src/exprs/decimal-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/8398/1/be/src/exprs/decimal-functions-ir.cc@111
PS1, Line 111:   DCHECK(!scale.is_null);
> Add a FE test case in AnalyzeExprsTest.java that makes sure this is enforce
Added a test.

I think we should remain consistent with other decimal round functions. They do 
not allow a non-constant round arguments, so I think it makes sense to not 
allow it here as well.


http://gerrit.cloudera.org:8080/#/c/8398/1/be/src/exprs/decimal-functions-ir.cc@116
PS1, Line 116:   bool ovf = false;
> overflow
Done


http://gerrit.cloudera.org:8080/#/c/8398/1/be/src/exprs/decimal-functions.h
File be/src/exprs/decimal-functions.h:

http://gerrit.cloudera.org:8080/#/c/8398/1/be/src/exprs/decimal-functions.h@58
PS1, Line 58:   FunctionContext*, const DoubleVal&, const IntVal&);
> add arg names for consistency
Done


http://gerrit.cloudera.org:8080/#/c/8398/1/common/function-registry/impala_functions.py
File common/function-registry/impala_functions.py:

http://gerrit.cloudera.org:8080/#/c/8398/1/common/function-registry/impala_functions.py@284
PS1, Line 284:   [['round_v1','dround_v1'],
> Why keep the dround_v1 alias?
Because what if someone typed in "select dround(x)". We need to be able to 
differentiate between round and dround in error messages.

Are you suggesting to get rid of the "dround" alias completely?


http://gerrit.cloudera.org:8080/#/c/8398/1/common/function-registry/impala_functions.py@286
PS1, Line 286:   [['round','dround'], 'DECIMAL', ['DOUBLE', 'INT'], 
'impala::DecimalFunctions::RoundTo'],
> Can you organize these and comment on whether these are used in v1/v2 or bo
Done


http://gerrit.cloudera.org:8080/#/c/8398/1/common/function-registry/impala_functions.py@287
PS1, Line 287:   [['round','dround','round_v1','dround_v1'],
> Why keep the dround_v1 alias?
Same reason as above.


http://gerrit.cloudera.org:8080/#/c/8398/1/common/function-registry/impala_functions.py@403
PS1, Line 403:   [['round','dround','round_v1','dround_v1'],
> Why keep the dround_v1 alias in these?
Answered above.


http://gerrit.cloudera.org:8080/#/c/8398/1/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

http://gerrit.cloudera.org:8080/#/c/8398/1/fe/src/main/cup/sql-parser.cup@70
PS1, Line 70:   private TQueryOptions queryOptions;
> add TODO to remove when decimal v1 is gone, maybe come up with a specific t
Done


http://gerrit.cloudera.org:8080/#/c/8398/1/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java:

http://gerrit.cloudera.org:8080/#/c/8398/1/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java@100
PS1, Line 100: FunctionCallExpr functionCallExpr = new 
FunctionCallExpr(fnName, params);
> newline
Done


http://gerrit.cloudera.org:8080/#/c/8398/1/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java@111
PS1, Line 111: // nullif(x, y) -> if(x DISTINCT FROM y, x, NULL)
> newline
Done


http://gerrit.cloudera.org:8080/#/c/8398/1/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java@385
PS1, Line 385:* This can only be called for functions that return wildcard 
decimals and the first
> Is this comment still accurate?
Fixed the comment.


http://gerrit.cloudera.org:8080/#/c/8398/1/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java@407
PS1, Line 407:   // The situtation where none of the parameters are 
decimal, but the return type is
> * Shrink comment to:
Done



--
To view, visit http://gerrit.cloudera.org:8080/8398
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I240ec70da7996bbfefcf6438eba4dff8d33d7bd6
Gerrit-Change-Number: 8398
Gerrit-PatchSet: 1
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Comment-Date: Tue, 07 Nov 2017 00:35:00 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5019: Decimal V2 addition

2017-10-27 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/8309 )

Change subject: IMPALA-5019: Decimal V2 addition
..

IMPALA-5019: Decimal V2 addition

In this patch, we implement the new decimal return type rules for
addition expressions. These rules become active when the query option
DECIMAL_V2 is enabled. The algorithm for determining the type of the
result is described in the JIRA.

DECIMAL V1:
++
| typeof(cast(1 as decimal(38,0)) + cast(0.1 as decimal(38,38))) |
++
| DECIMAL(38,38) |
++

DECIMAL V2:
++
| typeof(cast(1 as decimal(38,0)) + cast(0.1 as decimal(38,38))) |
++
| DECIMAL(38,6)  |
++

This patch required backend changes. We implement an algorithm where
we handle the whole and fractional parts separately, and then combine
them to get the final result. This is more complex and slower. We try
to avoid this by first checking if the result would fit into int128.

Testing:
- Added expr tests.
- Tested locally on my machine with a script that generates random
  decimal numbers and checks that Impala adds them correctly.

Performance:

For the common case, performance remains the same.
  select cast(2.2 as decimal(18, 1) + cast(2.2 as decimal(18, 1)

  BEFORE: 4.74s
  AFTER:  4.73s

In this case, we check if it is necessary to do the complex addition,
and it turns out to be not necessary. We see a slowdown because the
result needs to be scaled down by dividing.
  select cast(2.2 as decimal(38, 19) + cast(2.2 as decimal(38, 19)

  BEFORE: 1.63s
  AFTER:  13.57s

In following case, we take the most complex path and see the most
signification perfmance hit.
  select cast(7.5 as decimal(38,37)) + cast(2.2 as decimal(38,37))

  BEFORE: 1.63s
  AFTER: 20.57

Change-Id: I401049c56d910eb1546a178c909c923b01239336
---
M be/src/exprs/expr-test.cc
M be/src/runtime/decimal-value.inline.h
M fe/src/main/java/org/apache/impala/analysis/TypesUtil.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
4 files changed, 387 insertions(+), 84 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/8309/3
--
To view, visit http://gerrit.cloudera.org:8080/8309
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I401049c56d910eb1546a178c909c923b01239336
Gerrit-Change-Number: 8309
Gerrit-PatchSet: 3
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-5019: Decimal V2 addition

2017-10-27 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8309 )

Change subject: IMPALA-5019: Decimal V2 addition
..


Patch Set 2:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/8309/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8309/1//COMMIT_MSG@31
PS1, Line 31: to avoid this by first checking if the result would into int128.
> fit?
Done


http://gerrit.cloudera.org:8080/#/c/8309/2/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

http://gerrit.cloudera.org:8080/#/c/8309/2/be/src/runtime/decimal-value.inline.h@166
PS2, Line 166: LeadingZeroAdjustment
> I think the name could make it clearer what it's computing. Maybe something
Done. I like the new name.


http://gerrit.cloudera.org:8080/#/c/8309/2/be/src/runtime/decimal-value.inline.h@170
PS2, Line 170: floor(log2(b))
> The table values are actually floor(log2(10^b)) for i = 0, 1, 2, 3... etc,
Yes, exactly. Fixed the comment.


http://gerrit.cloudera.org:8080/#/c/8309/2/be/src/runtime/decimal-value.inline.h@181
PS2, Line 181: same_sign
> We usually make numeric/bool template arguments upper case to make it clear
Separated this large function into 3 smaller ones, as you suggested.


http://gerrit.cloudera.org:8080/#/c/8309/2/be/src/runtime/decimal-value.inline.h@185
PS2, Line 185:   // This is expected to be handled by the caller.
> What about the case when they're zero? That's also meant to be handled by t
It wasn't possible for this condition to fail because at least one must be 
greater than zero for the following reasons:
- If they are both zero, we will not be calling this function, because of the 
leading zero test.
- If one is negative and the other is zero, we will invert it on line 327

I modified this check in the new implementation.

Yes we do have cases, where we add zero.


http://gerrit.cloudera.org:8080/#/c/8309/2/be/src/runtime/decimal-value.inline.h@272
PS2, Line 272: DCHECK_EQ(result_scale, std::max(this_scale, other_scale));
> A brief comment that this is guaranteed by the frontend migh help people ne
Done


http://gerrit.cloudera.org:8080/#/c/8309/2/be/src/runtime/decimal-value.inline.h@277
PS2, Line 277:   DCHECK(!*overflow) << "Cannot overflow unless result is 
Decimal16Value";
> extra indent here
Done


http://gerrit.cloudera.org:8080/#/c/8309/2/be/src/runtime/decimal-value.inline.h@287
PS2, Line 287:   if (this_scale < other_scale) {
> It might be slightly easier to follow if the branches were(delta_scale < 0)
Rewrote the code to make it more clear.


http://gerrit.cloudera.org:8080/#/c/8309/2/be/src/runtime/decimal-value.inline.h@299
PS2, Line 299: into
> long line.
Done


http://gerrit.cloudera.org:8080/#/c/8309/2/be/src/runtime/decimal-value.inline.h@305
PS2, Line 305: bool ovf = AdjustToSameScale(*this, this_scale, other, 
other_scale,
> It feels a bit weird that we're calculating delta_scale above and inside Ad
Delta scale is different in AdjustToSameScale(). There are two delta scales in 
this patch: between x and y and between max(x_scale, y_scale) and result_scale.

What we are doing here and in AddLarge() is first adjust to the same scale, do 
the addition, then scale down to result_scale.


http://gerrit.cloudera.org:8080/#/c/8309/2/be/src/runtime/decimal-value.inline.h@310
PS2, Line 310: if (delta_scale > 0) {
> This might need a brief comment to explain why it's necessary (or perhaps t
Added comment.


http://gerrit.cloudera.org:8080/#/c/8309/2/fe/src/main/java/org/apache/impala/analysis/TypesUtil.java
File fe/src/main/java/org/apache/impala/analysis/TypesUtil.java:

http://gerrit.cloudera.org:8080/#/c/8309/2/fe/src/main/java/org/apache/impala/analysis/TypesUtil.java@228
PS2, Line 228:* TODO: implement V2 rules for ADD/SUB.
> TODO not needed.
Done



--
To view, visit http://gerrit.cloudera.org:8080/8309
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I401049c56d910eb1546a178c909c923b01239336
Gerrit-Change-Number: 8309
Gerrit-PatchSet: 2
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Sat, 28 Oct 2017 00:02:20 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5017: Error on decimal overflow

2017-10-26 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8404 )

Change subject: IMPALA-5017: Error on decimal overflow
..


Patch Set 1:

Ooops, canceled


--
To view, visit http://gerrit.cloudera.org:8080/8404
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id98a92c9a9469ec8cf14e518c741a2dab7053019
Gerrit-Change-Number: 8404
Gerrit-PatchSet: 1
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Fri, 27 Oct 2017 03:32:29 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5017: Error on decimal overflow

2017-10-26 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8404


Change subject: IMPALA-5017: Error on decimal overflow
..

IMPALA-5017: Error on decimal overflow

Before this patch, decimal operations would either silently overflow (in
the case of sum() and avg()), or produce a warning.

In this patch, the beharior is changed so that an error is produced in
the case of overflow when DECIMAL_v2 is enabled. Decimal v1 behavior is
unchanged.

We introduce overflow checks when computing sum() and avg(). This
results in a ~30% performance regression when we are in decimal v2 mode
compared to decimal v1.

Benchmarks:
  Query:
  select sum(dec_38_19) from decimal_tbl
Decimal v1: 13.09s
Decimal v2: 17.10s

  Query:
  select avg(dec_38_19) from decimal_tbl
Decimal v1: 13.60s
Decimal v2: 19.10s

Testing:
- Added several end to end tests.
- Updated Expr tests to check for error in case of overflow.

Change-Id: Id98a92c9a9469ec8cf14e518c741a2dab7053019
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-codegen-test.cc
M be/src/exprs/expr-test.cc
M testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test
M testdata/workloads/functional-query/queries/QueryTest/decimal.test
6 files changed, 186 insertions(+), 69 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/8404/1
--
To view, visit http://gerrit.cloudera.org:8080/8404
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Id98a92c9a9469ec8cf14e518c741a2dab7053019
Gerrit-Change-Number: 8404
Gerrit-PatchSet: 1
Gerrit-Owner: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-3436: Return a decimal when rounding a double

2017-10-25 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8398


Change subject: IMPALA-3436: Return a decimal when rounding a double
..

IMPALA-3436: Return a decimal when rounding a double

Before this patch, the function round(double, int) returned a double,
which is inconsistent with other round() functions.

In this patch, we change the return type of round(double, int) to be
decimal only if decimal_v2 is enabled. If decimal_v2 is not enabled,
the function returns a double, as before.

This is implemented by introducing a "hack" where we make the parser
aware of query options and rename the function to round_v1() during
parsing if decimal_v2 is not enabled. To make this hack invisible, we
also rename it back to round() in toSql().

This hack should be removed when we remove decimal v1.

Testing:
- Added EE tests.

Change-Id: I240ec70da7996bbfefcf6438eba4dff8d33d7bd6
---
M be/src/exprs/decimal-functions-ir.cc
M be/src/exprs/decimal-functions.h
M be/src/exprs/decimal-operators.h
M common/function-registry/impala_functions.py
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test
8 files changed, 227 insertions(+), 13 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/8398/1
--
To view, visit http://gerrit.cloudera.org:8080/8398
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I240ec70da7996bbfefcf6438eba4dff8d33d7bd6
Gerrit-Change-Number: 8398
Gerrit-PatchSet: 1
Gerrit-Owner: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-4964: Fix Decimal modulo overflow

2017-10-23 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/8329 )

Change subject: IMPALA-4964: Fix Decimal modulo overflow
..

IMPALA-4964: Fix Decimal modulo overflow

The modulo operation between two decimals should never overflow. Before
this patch, there would be an overflow if the scale difference between
the two decimals was large. We would try to scale up the one with the
smaller scale, so that the scales matched, which could result in an
overflow.

We fix the problem by first checking if the scaled up value would fit
into 128 bits by estimating the number of leading zeros in it. If we
detect that 128 bits is not enough, we convert both numbers to int256,
do the operation, then convert back to 128 bits.

Testing:
- Added some expr tests that excercise the new code path.

Change-Id: I5420201d4440d421e33e443df005cdcc16b8a6cd
---
M be/src/exprs/expr-test.cc
M be/src/runtime/decimal-test.cc
M be/src/runtime/decimal-value.inline.h
3 files changed, 125 insertions(+), 27 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/8329/3
--
To view, visit http://gerrit.cloudera.org:8080/8329
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5420201d4440d421e33e443df005cdcc16b8a6cd
Gerrit-Change-Number: 8329
Gerrit-PatchSet: 3
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Reviewer: anujphadke 


[Impala-ASF-CR] IMPALA-5018: Error on decimal modulo or divide by zero

2017-10-23 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/8344 )

Change subject: IMPALA-5018: Error on decimal modulo or divide by zero
..

IMPALA-5018: Error on decimal modulo or divide by zero

Before this patch, decimal operations would never produce an error.
Division by and modulo zero would result in a NULL.

In this patch, we change this behavior so that we raise an error
instead of returning a NULL. We also modify the format of the decimal
expr tests format to also include an error field.

Testing:
- Added several expr and end to end tests.

Change-Id: If7a7131e657fcdd293ade78d62f851dac0f1e3eb
---
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-test.cc
M testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test
3 files changed, 682 insertions(+), 565 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/8344/4
--
To view, visit http://gerrit.cloudera.org:8080/8344
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If7a7131e657fcdd293ade78d62f851dac0f1e3eb
Gerrit-Change-Number: 8344
Gerrit-PatchSet: 4
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Reviewer: anujphadke 


[Impala-ASF-CR] IMPALA-5018: Error on decimal modulo or divide by zero

2017-10-23 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8344 )

Change subject: IMPALA-5018: Error on decimal modulo or divide by zero
..


Patch Set 3:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8344/3/be/src/exprs/decimal-operators-ir.cc
File be/src/exprs/decimal-operators-ir.cc:

http://gerrit.cloudera.org:8080/#/c/8344/3/be/src/exprs/decimal-operators-ir.cc@729
PS3, Line 729: divide
> My thought here when I suggested it was that the module was just an output
In one of the previous comments, Vuk mentioned that Postgres returns an 
identical error message for both mod and division: "ERROR: division by zero"

Personally, I don't think it's too confusing if it leave it is as. Or maybe it 
can be modified to "Cannot modulo decimal or divide decimal by zero" What do 
you think?


http://gerrit.cloudera.org:8080/#/c/8344/3/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

http://gerrit.cloudera.org:8080/#/c/8344/3/be/src/exprs/expr-test.cc@2264
PS3, Line 2264:  { true, false, 0, 38, 19 }}},
> it would be good to test also
Added tests to pytest.


http://gerrit.cloudera.org:8080/#/c/8344/3/be/src/exprs/expr-test.cc@2506
PS3, Line 2506: 0/0
> I think we should even change the math function fmod to return a consistent
Created IMPALA-6103.


http://gerrit.cloudera.org:8080/#/c/8344/3/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test
File testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test:

http://gerrit.cloudera.org:8080/#/c/8344/3/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test@46
PS3, Line 46: cast
> is that cast necessary? Let's also verify you get the error without the cas
Done



--
To view, visit http://gerrit.cloudera.org:8080/8344
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If7a7131e657fcdd293ade78d62f851dac0f1e3eb
Gerrit-Change-Number: 8344
Gerrit-PatchSet: 3
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Tue, 24 Oct 2017 00:12:18 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4964: Fix Decimal modulo overflow

2017-10-23 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/8329 )

Change subject: IMPALA-4964: Fix Decimal modulo overflow
..

IMPALA-4964: Fix Decimal modulo overflow

The modulo operation between two decimals should never overflow. Before
this patch, there would be an overflow if the scale difference between
the two decimals was large. We would try to scale up the one with the
smaller scale, so that the scales matched, which could result in an
overflow.

We fix the problem by first checking if the scaled up value would fit
into 128 bits by estimating the number of leading zeros in it. If we
detect that 128 bits is not enough, we convert both numbers to int256,
do the operation, then convert back to 128 bits.

Testing:
- Added some expr tests that excercise the new code path.

Change-Id: I5420201d4440d421e33e443df005cdcc16b8a6cd
---
M be/src/exprs/expr-test.cc
M be/src/runtime/decimal-test.cc
M be/src/runtime/decimal-value.inline.h
3 files changed, 120 insertions(+), 27 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/8329/2
--
To view, visit http://gerrit.cloudera.org:8080/8329
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5420201d4440d421e33e443df005cdcc16b8a6cd
Gerrit-Change-Number: 8329
Gerrit-PatchSet: 2
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Reviewer: anujphadke 


[Impala-ASF-CR] IMPALA-4964: Fix Decimal modulo overflow

2017-10-23 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8329 )

Change subject: IMPALA-4964: Fix Decimal modulo overflow
..


Patch Set 1:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/8329/1/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

http://gerrit.cloudera.org:8080/#/c/8329/1/be/src/runtime/decimal-value.inline.h@a281
PS1, Line 281:
 :
> May help to keep this comment.
Done


http://gerrit.cloudera.org:8080/#/c/8329/1/be/src/runtime/decimal-value.inline.h@181
PS1, Line 181: CheckAdjustment
> MinLeadingZero() or some other more meaningful name ?
Done


http://gerrit.cloudera.org:8080/#/c/8329/1/be/src/runtime/decimal-value.inline.h@497
PS1, Line 497: (other.value() == 0)
> nit: parenthesis not needed.
Done


http://gerrit.cloudera.org:8080/#/c/8329/1/be/src/runtime/decimal-value.inline.h@499
PS1, Line 499: // Mod by 0.
> Shouldn't we raise an error on decimal modulo 0 operations like you did in
Erroring on mod 0 is handled in another of my patches. We raise an error in 
that patch after we set is_nan in this function.


http://gerrit.cloudera.org:8080/#/c/8329/1/be/src/runtime/decimal-value.inline.h@498
PS1, Line 498:   if (UNLIKELY(*is_nan)) {
 : // Mod by 0.
 : return DecimalValue();
 :   }
> nit: one liner.
Done


http://gerrit.cloudera.org:8080/#/c/8329/1/be/src/runtime/decimal-value.inline.h@505
PS1, Line 505: if (sizeof(RESULT_T) < 16 ||
 :   result_precision < 38 ||
 :   // If the scales are the same, there is no danger in 
overflowing due to scaling up.
 :   this_scale == other_scale ||
 :   detail::CheckAdjustment(value(), this_scale, 
other.value(), other_scale) >= 2) {
> May help to add a comment to explain what this check is trying to achieve.
Added a comment to make it more clear. Added a comment about overflow at the 
bottom. Mod() should never overflow.


http://gerrit.cloudera.org:8080/#/c/8329/1/be/src/runtime/decimal-value.inline.h@514
PS1, Line 514: DCHECK(
> DCHECK_LT();
Unfortunately this does not work with int128_t.



--
To view, visit http://gerrit.cloudera.org:8080/8329
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5420201d4440d421e33e443df005cdcc16b8a6cd
Gerrit-Change-Number: 8329
Gerrit-PatchSet: 1
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Mon, 23 Oct 2017 23:19:35 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5018: Error on decimal modulo or divide by zero

2017-10-21 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8344 )

Change subject: IMPALA-5018: Error on decimal modulo or divide by zero
..


Patch Set 3:

I spoke to Greg and Alex yesterday, and we agreed that decimal_v2 should always 
be in strict mode. This means that when decimal_v2 is enabled, we should always 
error on overflows and division by zero. This is the behavior that more 
traditional databases have.


--
To view, visit http://gerrit.cloudera.org:8080/8344
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If7a7131e657fcdd293ade78d62f851dac0f1e3eb
Gerrit-Change-Number: 8344
Gerrit-PatchSet: 3
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Sat, 21 Oct 2017 19:15:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5018: Error on decimal modulo or divide by zero

2017-10-20 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8344 )

Change subject: IMPALA-5018: Error on decimal modulo or divide by zero
..


Patch Set 3:

Rebased.


--
To view, visit http://gerrit.cloudera.org:8080/8344
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If7a7131e657fcdd293ade78d62f851dac0f1e3eb
Gerrit-Change-Number: 8344
Gerrit-PatchSet: 3
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Sat, 21 Oct 2017 00:19:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5018: Error on decimal modulo or divide by zero

2017-10-20 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/8344 )

Change subject: IMPALA-5018: Error on decimal modulo or divide by zero
..

IMPALA-5018: Error on decimal modulo or divide by zero

Before this patch, decimal operations would never produce an error.
Division by and modulo zero would result in a NULL.

In this patch, we change this behavior so that we raise an error
instead of returning a NULL. We also modify the format of the decimal
expr tests format to also include an error field.

Testing:
- Added several expr and end to end tests.

Change-Id: If7a7131e657fcdd293ade78d62f851dac0f1e3eb
---
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-test.cc
M testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test
3 files changed, 656 insertions(+), 565 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/8344/3
--
To view, visit http://gerrit.cloudera.org:8080/8344
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If7a7131e657fcdd293ade78d62f851dac0f1e3eb
Gerrit-Change-Number: 8344
Gerrit-PatchSet: 3
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-5018: Error on decimal modulo or divide by zero

2017-10-20 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8344 )

Change subject: IMPALA-5018: Error on decimal modulo or divide by zero
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8344/2/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

http://gerrit.cloudera.org:8080/#/c/8344/2/be/src/exprs/expr-test.cc@1561
PS2, Line 1561: expected null, scaled_val, precision, scale for V1
  : //expected null, scaled_val, precision, scale for V2 }}
> that needs updating
Done



--
To view, visit http://gerrit.cloudera.org:8080/8344
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If7a7131e657fcdd293ade78d62f851dac0f1e3eb
Gerrit-Change-Number: 8344
Gerrit-PatchSet: 2
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Sat, 21 Oct 2017 00:19:34 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5018: Error on decimal divide by and modulo zero

2017-10-20 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8344 )

Change subject: IMPALA-5018: Error on decimal divide by and modulo zero
..


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8344/1/be/src/exprs/decimal-operators-ir.cc
File be/src/exprs/decimal-operators-ir.cc:

http://gerrit.cloudera.org:8080/#/c/8344/1/be/src/exprs/decimal-operators-ir.cc@737
PS1, Line 737: Decimal division by or modulo zero
> For either / or %, postgres reports:
Done


http://gerrit.cloudera.org:8080/#/c/8344/1/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

http://gerrit.cloudera.org:8080/#/c/8344/1/be/src/exprs/expr-test.cc@1567
PS1, Line 1567:   { "cast(1.23 as decimal(8,2)) + cast(1 as decimal(20,3))", {{ 
false, false, 2230, 21, 3 }}},
> Long lines.
Done


http://gerrit.cloudera.org:8080/#/c/8344/1/be/src/exprs/expr-test.cc@1567
PS1, Line 1567:   { "cast(1.23 as decimal(8,2)) + cast(1 as decimal(20,3))", {{ 
false, false, 2230, 21, 3 }}},
> Long lines.
Done



--
To view, visit http://gerrit.cloudera.org:8080/8344
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If7a7131e657fcdd293ade78d62f851dac0f1e3eb
Gerrit-Change-Number: 8344
Gerrit-PatchSet: 1
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Fri, 20 Oct 2017 19:56:32 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5018: Error on decimal modulo or divide by zero

2017-10-20 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/8344 )

Change subject: IMPALA-5018: Error on decimal modulo or divide by zero
..

IMPALA-5018: Error on decimal modulo or divide by zero

Before this patch, decimal operations would never produce an error.
Division by and modulo zero would result in a NULL.

In this patch, we change this behavior so that we raise an error
instead of returning a NULL. We also modify the format of the decimal
expr tests format to also include an error field.

Testing:
- Added several expr and end to end tests.

Change-Id: If7a7131e657fcdd293ade78d62f851dac0f1e3eb
---
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-test.cc
M testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test
3 files changed, 654 insertions(+), 563 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/8344/2
--
To view, visit http://gerrit.cloudera.org:8080/8344
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If7a7131e657fcdd293ade78d62f851dac0f1e3eb
Gerrit-Change-Number: 8344
Gerrit-PatchSet: 2
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-5018: Error on decimal divide by and modulo zero

2017-10-19 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8344


Change subject: IMPALA-5018: Error on decimal divide by and modulo zero
..

IMPALA-5018: Error on decimal divide by and modulo zero

Before this patch, decimal operations would never produce an error.
Division by and modulo zero would result in a NULL.

In this patch, we change this behavior so that we raise an error
instead of returning a NULL. We also modify the format of the decimal
expr tests format to also include an error field.

Testing:
- Added several expr and end to end tests.

Change-Id: If7a7131e657fcdd293ade78d62f851dac0f1e3eb
---
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-test.cc
M testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test
3 files changed, 619 insertions(+), 551 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/8344/1
--
To view, visit http://gerrit.cloudera.org:8080/8344
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: If7a7131e657fcdd293ade78d62f851dac0f1e3eb
Gerrit-Change-Number: 8344
Gerrit-PatchSet: 1
Gerrit-Owner: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-4964: Fix Decimal modulo overflow

2017-10-18 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8329


Change subject: IMPALA-4964: Fix Decimal modulo overflow
..

IMPALA-4964: Fix Decimal modulo overflow

The modulo operation between two decimals should never overflow. Before
this patch, there would be an overflow if the scale difference between
the two decimals was large. We would try to scale up the one with the
smaller scale, so that the scales matched, which could result in an
overflow.

We fix the problem by first checking if the scaled up value would fit
into 128 bits by estimating the number of leading zeros in it. If we
detect that 128 bits is not enough, we convert both numbers to int256,
do the operation, then convert back to 128 bits.

Testing:
- Added some expr tests that excercise the new code path.

Change-Id: I5420201d4440d421e33e443df005cdcc16b8a6cd
---
M be/src/exprs/expr-test.cc
M be/src/runtime/decimal-test.cc
M be/src/runtime/decimal-value.inline.h
3 files changed, 116 insertions(+), 26 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/8329/1
-- 
To view, visit http://gerrit.cloudera.org:8080/8329
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I5420201d4440d421e33e443df005cdcc16b8a6cd
Gerrit-Change-Number: 8329
Gerrit-PatchSet: 1
Gerrit-Owner: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-5019: Decimal V2 addition

2017-10-18 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/8309 )

Change subject: IMPALA-5019: Decimal V2 addition
..

IMPALA-5019: Decimal V2 addition

In this patch, we implement the new decimal return type rules for
addition expressions. These rules become active when the query option
DECIMAL_V2 is enabled. The algorithm for determining the type of the
result is described in the JIRA.

DECIMAL V1:
++
| typeof(cast(1 as decimal(38,0)) + cast(0.1 as decimal(38,38))) |
++
| DECIMAL(38,38) |
++

DECIMAL V2:
++
| typeof(cast(1 as decimal(38,0)) + cast(0.1 as decimal(38,38))) |
++
| DECIMAL(38,6)  |
++

This patch required backend changes. We implement an algorithm where
we handle the whole and fractional parts separately, and then combine
them to get the final result. This is more complex and slower. We try
to avoid this by first checking if the result would into int128.

Testing:
- Added expr tests.
- Tested locally on my machine with a script that generates random
  decimal numbers and checks that Impala adds them correctly.

Performance:

For the common case, performance remains the same.
  select cast(2.2 as decimal(18, 1) + cast(2.2 as decimal(18, 1)

  BEFORE: 4.74s
  AFTER:  4.73s

In this case, we check if it is necessary to do the complex addition,
and it turns out to be not necessary. We see a slowdown because the
result needs to be scaled down by dividing.
  select cast(2.2 as decimal(38, 19) + cast(2.2 as decimal(38, 19)

  BEFORE: 1.63s
  AFTER:  13.57s

In following case, we take the most complex path and see the most
signification perfmance hit.
  select cast(7.5 as decimal(38,37)) + cast(2.2 as decimal(38,37))

  BEFORE: 1.63s
  AFTER: 20.57

Change-Id: I401049c56d910eb1546a178c909c923b01239336
---
M be/src/exprs/expr-test.cc
M be/src/runtime/decimal-value.inline.h
M fe/src/main/java/org/apache/impala/analysis/TypesUtil.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
4 files changed, 297 insertions(+), 82 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/8309/2
--
To view, visit http://gerrit.cloudera.org:8080/8309
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I401049c56d910eb1546a178c909c923b01239336
Gerrit-Change-Number: 8309
Gerrit-PatchSet: 2
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-5019: Decimal V2 addition

2017-10-17 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8309 )

Change subject: IMPALA-5019: Decimal V2 addition
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8309/1/fe/src/main/java/org/apache/impala/analysis/TypesUtil.java
File fe/src/main/java/org/apache/impala/analysis/TypesUtil.java:

http://gerrit.cloudera.org:8080/#/c/8309/1/fe/src/main/java/org/apache/impala/analysis/TypesUtil.java@268
PS1, Line 268: // Not yet implemented - fall back to V1 rules.
I think this can now be removed now.



--
To view, visit http://gerrit.cloudera.org:8080/8309
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I401049c56d910eb1546a178c909c923b01239336
Gerrit-Change-Number: 8309
Gerrit-PatchSet: 1
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Wed, 18 Oct 2017 02:30:22 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5019: Decimal V2 addition

2017-10-17 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8309


Change subject: IMPALA-5019: Decimal V2 addition
..

IMPALA-5019: Decimal V2 addition

In this patch, we implement the new decimal return type rules for
addition expressions. These rules become active when the query option
DECIMAL_V2 is enabled. The algorithm for determining the type of the
result is described in the JIRA.

DECIMAL V1:
++
| typeof(cast(1 as decimal(38,0)) + cast(0.1 as decimal(38,38))) |
++
| DECIMAL(38,38) |
++

DECIMAL V2:
++
| typeof(cast(1 as decimal(38,0)) + cast(0.1 as decimal(38,38))) |
++
| DECIMAL(38,6)  |
++

This patch required backend changes. We implement an algorithm where
we handle the whole and fractional parts separately, and then combine
them to get the final result. This is more complex and slower. We try
to avoid this by first checking if the result would into int128.

Testing:
- Added expr tests.
- Tested locally on my machine with a script that generates random
  decimal numbers and checks that Impala adds them correctly.

Performance:

For the common case, performance remains the same.
  select cast(2.2 as decimal(18, 1) + cast(2.2 as decimal(18, 1)

  BEFORE: 4.74s
  AFTER:  4.73s

In this case, we check if it is necessary to do the complex addition,
and it turns out to be not necessary. We see a slowdown because the
result needs to be scaled down by dividing.
  select cast(2.2 as decimal(38, 19) + cast(2.2 as decimal(38, 19)

  BEFORE: 1.63s
  AFTER:  13.57s

In following case, we take the most complex path and see the most
signification perfmance hit.
  select cast(7.5 as decimal(38,37)) + cast(2.2 as decimal(38,37))

  BEFORE: 1.63s
  AFTER: 20.57

Change-Id: I401049c56d910eb1546a178c909c923b01239336
---
M be/src/exprs/expr-test.cc
M be/src/runtime/decimal-value.inline.h
M fe/src/main/java/org/apache/impala/analysis/TypesUtil.java
3 files changed, 282 insertions(+), 69 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/8309/1
--
To view, visit http://gerrit.cloudera.org:8080/8309
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I401049c56d910eb1546a178c909c923b01239336
Gerrit-Change-Number: 8309
Gerrit-PatchSet: 1
Gerrit-Owner: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-4848: Add WIDTH BUCKET() function

2017-10-09 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/6023 )

Change subject: IMPALA-4848: Add WIDTH_BUCKET() function
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6023/7/be/src/exprs/math-functions-ir.cc
File be/src/exprs/math-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/6023/7/be/src/exprs/math-functions-ir.cc@516
PS7, Line 516:   int256_t x = ConvertToInt256(buckets.value()) * 
ConvertToInt256(width_size.value());
> Sure, that's true. I don't think that this function is meant to be used tha
This idea may give a nice performance boost (if it works) because all the heavy 
lifting (such as dividing and maybe converting to int256) is done once at the 
beginning when we are constructing the array. The array will contain values 
that have the same precision and scale as the input expression so all we have 
to do is O(log(number of buckets)) comparisons of expr to array elements in 
case of binary search. Or O(number of buckets) comparisons if number of buckets 
is small.

If number of buckets is very large, we can fall back to the current 
implementation. (we can have 2 versions WidthBucketImpl such as 
WidthBucketImplSmall WidthBucketImplLarge)



--
To view, visit http://gerrit.cloudera.org:8080/6023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I081bc916b1bef7b929ca161a9aade3b54c6b858f
Gerrit-Change-Number: 6023
Gerrit-PatchSet: 7
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Tue, 10 Oct 2017 00:49:40 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4848: Add WIDTH BUCKET() function

2017-10-09 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/6023 )

Change subject: IMPALA-4848: Add WIDTH_BUCKET() function
..


Patch Set 7:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/6023/7/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

http://gerrit.cloudera.org:8080/#/c/6023/7/be/src/exprs/expr-test.cc@4247
PS7, Line 4247:   TestValue("width_bucket(6.3, 2, 17, 2)", TYPE_BIGINT, 1);
You should include a case like (1, 1, 6, 3)


http://gerrit.cloudera.org:8080/#/c/6023/7/be/src/exprs/math-functions-ir.cc
File be/src/exprs/math-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/6023/7/be/src/exprs/math-functions-ir.cc@429
PS7, Line 429: bucket_width
This should be called bucket_number to make it more clear


http://gerrit.cloudera.org:8080/#/c/6023/7/be/src/exprs/math-functions-ir.cc@431
PS7, Line 431: width_size
width_size is a confusing name. This should be called something like 
"distance_from_min".


http://gerrit.cloudera.org:8080/#/c/6023/7/be/src/exprs/math-functions-ir.cc@516
PS7, Line 516:   int256_t x = ConvertToInt256(buckets.value()) * 
ConvertToInt256(width_size.value());
> Won't this require a lot of memory to create such a array?
Sure, that's true. I don't think that this function is meant to be used that 
way though. It's difficult to imagine someone wanting to use over 1000 buckets. 
In that case, we could fall back to the current implementation.



--
To view, visit http://gerrit.cloudera.org:8080/6023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I081bc916b1bef7b929ca161a9aade3b54c6b858f
Gerrit-Change-Number: 6023
Gerrit-PatchSet: 7
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Tue, 10 Oct 2017 00:41:56 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4939, IMPALA-4940: Decimal V2 multiplication

2017-10-03 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7438 )

Change subject: IMPALA-4939, IMPALA-4940: Decimal V2 multiplication
..


Patch Set 12: Code-Review+2

Fixed the uninitialized variable bug in expr-test.cc. Forwarding the +2. Thanks 
Jim!


--
To view, visit http://gerrit.cloudera.org:8080/7438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1
Gerrit-Change-Number: 7438
Gerrit-PatchSet: 12
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Wed, 04 Oct 2017 05:26:47 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4939, IMPALA-4940: Decimal V2 multiplication

2017-10-03 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#12). ( 
http://gerrit.cloudera.org:8080/7438 )

Change subject: IMPALA-4939, IMPALA-4940: Decimal V2 multiplication
..

IMPALA-4939, IMPALA-4940: Decimal V2 multiplication

Implement the new DECIMAL return type rules for multiply expressions,
active when query option DECIMAL_V2=1. The algorithm for determining
the type of the result of multiplication is described in the JIRA.

DECIMAL V1:

+---+
| typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) |
+---+
| DECIMAL(38,38)|
+---+

+---+
| typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) |
+---+
| DECIMAL(38,30)|
+---+

DECIMAL V2:

+---+
| typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) |
+---+
| DECIMAL(38,37)|
+---+

+---+
| typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) |
+---+
| DECIMAL(38,6) |
+---+

In this patch, we also fix the early multiplication overflow. We compute
a 256 bit integer intermediate value, which we then attempt to scale down
and round.

Performance:

I ran TPCH 300 and TPCDS 1000 workloads and the performance is almost
identical. For TPCH Q1, there was an improvement from 21 seconds to 16
seconds. I did not see any regressions.

The performance improvement is due to the way we check for overflows
after this patch (by counting the leading zeros instead of dividing).
It can be clealy seen in this query:
  select cast(2.2 as decimal(38, 1)) * cast(2.2 as decimal(38, 1))
  before: 7.85s
  after:  2.03s

I noticed performance regressions in the following cases:
- When we need to convert to a 256 bit integer before multiplying,
  which was introduced in this patch. Whether this happens depends on
  the resulting precision and the value of the inputs. In the following
  extreme case, the intermediate value is converted to a 256 bit integer
  every time.

  select cast(1.1 as decimal(38, 37)) * cast(1.1 as decimal(38, 37))
  before: 14.56s (returns null)
  after:  126.17s

- When we need to scale down the intermediate value. In the following
  query the result is decimal(38,6) after the patch, so the
  intermediate needs to be scaled down.

  select cast(2.2 as decimal(38,1)) * cast(2.2 as decimal(38,19))
  before: 7.25s
  after:  13.06s

These regressions are possible only when the resulting precision is 38
which is not common in typical workloads.

Note: The actual queries that I ran for the benchmark are not exactly as
  above. I constructed tables with millions of rows with those values. I
  ran the queries with DECIMAL_v2=1 option before and after the patch.

Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1
---
M be/src/exprs/expr-test.cc
M be/src/runtime/decimal-value.inline.h
M be/src/util/bit-util.h
M fe/src/main/java/org/apache/impala/analysis/TypesUtil.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
5 files changed, 333 insertions(+), 62 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/7438/12
--
To view, visit http://gerrit.cloudera.org:8080/7438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1
Gerrit-Change-Number: 7438
Gerrit-PatchSet: 12
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-4939, IMPALA-4940: Decimal V2 multiplication

2017-10-03 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7438 )

Change subject: IMPALA-4939, IMPALA-4940: Decimal V2 multiplication
..


Patch Set 11: Code-Review+2

Updated FE tests. Forwarding the +2.


--
To view, visit http://gerrit.cloudera.org:8080/7438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1
Gerrit-Change-Number: 7438
Gerrit-PatchSet: 11
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Wed, 04 Oct 2017 02:49:59 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4939, IMPALA-4940: Decimal V2 multiplication

2017-10-03 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#11). ( 
http://gerrit.cloudera.org:8080/7438 )

Change subject: IMPALA-4939, IMPALA-4940: Decimal V2 multiplication
..

IMPALA-4939, IMPALA-4940: Decimal V2 multiplication

Implement the new DECIMAL return type rules for multiply expressions,
active when query option DECIMAL_V2=1. The algorithm for determining
the type of the result of multiplication is described in the JIRA.

DECIMAL V1:

+---+
| typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) |
+---+
| DECIMAL(38,38)|
+---+

+---+
| typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) |
+---+
| DECIMAL(38,30)|
+---+

DECIMAL V2:

+---+
| typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) |
+---+
| DECIMAL(38,37)|
+---+

+---+
| typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) |
+---+
| DECIMAL(38,6) |
+---+

In this patch, we also fix the early multiplication overflow. We compute
a 256 bit integer intermediate value, which we then attempt to scale down
and round.

Performance:

I ran TPCH 300 and TPCDS 1000 workloads and the performance is almost
identical. For TPCH Q1, there was an improvement from 21 seconds to 16
seconds. I did not see any regressions.

The performance improvement is due to the way we check for overflows
after this patch (by counting the leading zeros instead of dividing).
It can be clealy seen in this query:
  select cast(2.2 as decimal(38, 1)) * cast(2.2 as decimal(38, 1))
  before: 7.85s
  after:  2.03s

I noticed performance regressions in the following cases:
- When we need to convert to a 256 bit integer before multiplying,
  which was introduced in this patch. Whether this happens depends on
  the resulting precision and the value of the inputs. In the following
  extreme case, the intermediate value is converted to a 256 bit integer
  every time.

  select cast(1.1 as decimal(38, 37)) * cast(1.1 as decimal(38, 37))
  before: 14.56s (returns null)
  after:  126.17s

- When we need to scale down the intermediate value. In the following
  query the result is decimal(38,6) after the patch, so the
  intermediate needs to be scaled down.

  select cast(2.2 as decimal(38,1)) * cast(2.2 as decimal(38,19))
  before: 7.25s
  after:  13.06s

These regressions are possible only when the resulting precision is 38
which is not common in typical workloads.

Note: The actual queries that I ran for the benchmark are not exactly as
  above. I constructed tables with millions of rows with those values. I
  ran the queries with DECIMAL_v2=1 option before and after the patch.

Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1
---
M be/src/exprs/expr-test.cc
M be/src/runtime/decimal-value.inline.h
M be/src/util/bit-util.h
M fe/src/main/java/org/apache/impala/analysis/TypesUtil.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
5 files changed, 333 insertions(+), 62 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/7438/11
--
To view, visit http://gerrit.cloudera.org:8080/7438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1
Gerrit-Change-Number: 7438
Gerrit-PatchSet: 11
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

2017-10-03 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7061 )

Change subject: IMPALA-5394: Change ThriftServer() to always use 
TAcceptQueueServer
..


Patch Set 11:

The run failed due to IMPALA-6009. Rebase and try again.


--
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-Change-Number: 7061
Gerrit-PatchSet: 11
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Comment-Date: Tue, 03 Oct 2017 22:36:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4939, IMPALA-4940: Decimal V2 multiplication

2017-10-03 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7438 )

Change subject: IMPALA-4939, IMPALA-4940: Decimal V2 multiplication
..


Patch Set 10: Code-Review+2

Rebased, forwarding the +2


--
To view, visit http://gerrit.cloudera.org:8080/7438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1
Gerrit-Change-Number: 7438
Gerrit-PatchSet: 10
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Tue, 03 Oct 2017 21:51:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4939, IMPALA-4940: Decimal V2 multiplication

2017-10-03 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#10). ( 
http://gerrit.cloudera.org:8080/7438 )

Change subject: IMPALA-4939, IMPALA-4940: Decimal V2 multiplication
..

IMPALA-4939, IMPALA-4940: Decimal V2 multiplication

Implement the new DECIMAL return type rules for multiply expressions,
active when query option DECIMAL_V2=1. The algorithm for determining
the type of the result of multiplication is described in the JIRA.

DECIMAL V1:

+---+
| typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) |
+---+
| DECIMAL(38,38)|
+---+

+---+
| typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) |
+---+
| DECIMAL(38,30)|
+---+

DECIMAL V2:

+---+
| typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) |
+---+
| DECIMAL(38,37)|
+---+

+---+
| typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) |
+---+
| DECIMAL(38,6) |
+---+

In this patch, we also fix the early multiplication overflow. We compute
a 256 bit integer intermediate value, which we then attempt to scale down
and round.

Performance:

I ran TPCH 300 and TPCDS 1000 workloads and the performance is almost
identical. For TPCH Q1, there was an improvement from 21 seconds to 16
seconds. I did not see any regressions.

The performance improvement is due to the way we check for overflows
after this patch (by counting the leading zeros instead of dividing).
It can be clealy seen in this query:
  select cast(2.2 as decimal(38, 1)) * cast(2.2 as decimal(38, 1))
  before: 7.85s
  after:  2.03s

I noticed performance regressions in the following cases:
- When we need to convert to a 256 bit integer before multiplying,
  which was introduced in this patch. Whether this happens depends on
  the resulting precision and the value of the inputs. In the following
  extreme case, the intermediate value is converted to a 256 bit integer
  every time.

  select cast(1.1 as decimal(38, 37)) * cast(1.1 as decimal(38, 37))
  before: 14.56s (returns null)
  after:  126.17s

- When we need to scale down the intermediate value. In the following
  query the result is decimal(38,6) after the patch, so the
  intermediate needs to be scaled down.

  select cast(2.2 as decimal(38,1)) * cast(2.2 as decimal(38,19))
  before: 7.25s
  after:  13.06s

These regressions are possible only when the resulting precision is 38
which is not common in typical workloads.

Note: The actual queries that I ran for the benchmark are not exactly as
  above. I constructed tables with millions of rows with those values. I
  ran the queries with DECIMAL_v2=1 option before and after the patch.

Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1
---
M be/src/exprs/expr-test.cc
M be/src/runtime/decimal-value.inline.h
M be/src/util/bit-util.h
M fe/src/main/java/org/apache/impala/analysis/TypesUtil.java
4 files changed, 330 insertions(+), 59 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/7438/10
--
To view, visit http://gerrit.cloudera.org:8080/7438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1
Gerrit-Change-Number: 7438
Gerrit-PatchSet: 10
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-4939, IMPALA-4940: Decimal V2 multiplication

2017-10-03 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#9). ( 
http://gerrit.cloudera.org:8080/7438 )

Change subject: IMPALA-4939, IMPALA-4940: Decimal V2 multiplication
..

IMPALA-4939, IMPALA-4940: Decimal V2 multiplication

Implement the new DECIMAL return type rules for multiply expressions,
active when query option DECIMAL_V2=1. The algorithm for determining
the type of the result of multiplication is described in the JIRA.

DECIMAL V1:

+---+
| typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) |
+---+
| DECIMAL(38,38)|
+---+

+---+
| typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) |
+---+
| DECIMAL(38,30)|
+---+

DECIMAL V2:

+---+
| typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) |
+---+
| DECIMAL(38,37)|
+---+

+---+
| typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) |
+---+
| DECIMAL(38,6) |
+---+

In this patch, we also fix the early multiplication overflow. We compute
a 256 bit integer intermediate value, which we then attempt to scale down
and round.

Performance:

I ran TPCH 300 and TPCDS 1000 workloads and the performance is almost
identical. For TPCH Q1, there was an improvement from 21 seconds to 16
seconds. I did not see any regressions.

The performance improvement is due to the way we check for overflows
after this patch (by counting the leading zeros instead of dividing).
It can be clealy seen in this query:
  select cast(2.2 as decimal(38, 1)) * cast(2.2 as decimal(38, 1))
  before: 7.85s
  after:  2.03s

I noticed performance regressions in the following cases:
- When we need to convert to a 256 bit integer before multiplying,
  which was introduced in this patch. Whether this happens depends on
  the resulting precision and the value of the inputs. In the following
  extreme case, the intermediate value is converted to a 256 bit integer
  every time.

  select cast(1.1 as decimal(38, 37)) * cast(1.1 as decimal(38, 37))
  before: 14.56s (returns null)
  after:  126.17s

- When we need to scale down the intermediate value. In the following
  query the result is decimal(38,6) after the patch, so the
  intermediate needs to be scaled down.

  select cast(2.2 as decimal(38,1)) * cast(2.2 as decimal(38,19))
  before: 7.25s
  after:  13.06s

These regressions are possible only when the resulting precision is 38
which is not common in typical workloads.

Note: The actual queries that I ran for the benchmark are not exactly as
  above. I constructed tables with millions of rows with those values. I
  ran the queries with DECIMAL_v2=1 option before and after the patch.

Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1
---
M be/src/exprs/expr-test.cc
M be/src/runtime/decimal-value.inline.h
M be/src/util/bit-util.h
M fe/src/main/java/org/apache/impala/analysis/TypesUtil.java
4 files changed, 330 insertions(+), 59 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/7438/9
--
To view, visit http://gerrit.cloudera.org:8080/7438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1
Gerrit-Change-Number: 7438
Gerrit-PatchSet: 9
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-4939, IMPALA-4940: Decimal V2 multiplication

2017-10-03 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7438 )

Change subject: IMPALA-4939, IMPALA-4940: Decimal V2 multiplication
..


Patch Set 8:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/7438/8/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

http://gerrit.cloudera.org:8080/#/c/7438/8/be/src/exprs/expr-test.cc@1672
PS8, Line 1672: not require int256.
> if result does not require int256, why does it overflow (return null)?
Added comment.


http://gerrit.cloudera.org:8080/#/c/7438/8/be/src/exprs/expr-test.cc@1675
PS8, Line 1675: { true, 0, 38, 0 }}},
> when both V1 and V2 give the same answer, you don't have to include both re
Done


http://gerrit.cloudera.org:8080/#/c/7438/8/be/src/exprs/expr-test.cc@1680
PS8, Line 1680:   { "cast(18446744073709551615 as decimal(38,0)) * 
cast(9223372036854775808 as decimal(38,0))",
> It might be helpful if you mentioned how these values where computed in the
Added comment.


http://gerrit.cloudera.org:8080/#/c/7438/8/be/src/exprs/expr-test.cc@1702
PS8, Line 1702: 85070591730234615865843651857942052864
> how did you verify these results?
I used Python to make sure I get the same results.



--
To view, visit http://gerrit.cloudera.org:8080/7438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1
Gerrit-Change-Number: 7438
Gerrit-PatchSet: 8
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Tue, 03 Oct 2017 21:26:14 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4939, IMPALA-4940: Decimal V2 multiplication

2017-10-02 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7438 )

Change subject: IMPALA-4939, IMPALA-4940: Decimal V2 multiplication
..


Patch Set 8:

I made some additional changes in patch 8 as a result of thinking about your 
comment. Dan, can you take a look one more time?


--
To view, visit http://gerrit.cloudera.org:8080/7438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1
Gerrit-Change-Number: 7438
Gerrit-PatchSet: 8
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Tue, 03 Oct 2017 02:55:19 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4939, IMPALA-4940: Decimal V2 multiplication

2017-10-02 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#8). ( 
http://gerrit.cloudera.org:8080/7438 )

Change subject: IMPALA-4939, IMPALA-4940: Decimal V2 multiplication
..

IMPALA-4939, IMPALA-4940: Decimal V2 multiplication

Implement the new DECIMAL return type rules for multiply expressions,
active when query option DECIMAL_V2=1. The algorithm for determining
the type of the result of multiplication is described in the JIRA.

DECIMAL V1:

+---+
| typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) |
+---+
| DECIMAL(38,38)|
+---+

+---+
| typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) |
+---+
| DECIMAL(38,30)|
+---+

DECIMAL V2:

+---+
| typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) |
+---+
| DECIMAL(38,37)|
+---+

+---+
| typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) |
+---+
| DECIMAL(38,6) |
+---+

In this patch, we also fix the early multiplication overflow. We compute
a 256 bit integer intermediate value, which we then attempt to scale down
and round.

Performance:

I ran TPCH 300 and TPCDS 1000 workloads and the performance is almost
identical. For TPCH Q1, there was an improvement from 21 seconds to 16
seconds. I did not see any regressions.

The performance improvement is due to the way we check for overflows
after this patch (by counting the leading zeros instead of dividing).
It can be clealy seen in this query:
  select cast(2.2 as decimal(38, 1)) * cast(2.2 as decimal(38, 1))
  before: 7.85s
  after:  2.03s

I noticed performance regressions in the following cases:
- When we need to convert to a 256 bit integer before multiplying,
  which was introduced in this patch. Whether this happens depends on
  the resulting precision and the value of the inputs. In the following
  extreme case, the intermediate value is converted to a 256 bit integer
  every time.

  select cast(1.1 as decimal(38, 37)) * cast(1.1 as decimal(38, 37))
  before: 14.56s (returns null)
  after:  126.17s

- When we need to scale down the intermediate value. In the following
  query the result is decimal(38,6) after the patch, so the
  intermediate needs to be scaled down.

  select cast(2.2 as decimal(38,1)) * cast(2.2 as decimal(38,19))
  before: 7.25s
  after:  13.06s

These regressions are possible only when the resulting precision is 38
which is not common in typical workloads.

Note: The actual queries that I ran for the benchmark are not exactly as
  above. I constructed tables with millions of rows with those values. I
  ran the queries with DECIMAL_v2=1 option before and after the patch.

Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1
---
M be/src/exprs/expr-test.cc
M be/src/runtime/decimal-value.inline.h
M be/src/util/bit-util.h
M fe/src/main/java/org/apache/impala/analysis/TypesUtil.java
4 files changed, 332 insertions(+), 59 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/7438/8
--
To view, visit http://gerrit.cloudera.org:8080/7438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1
Gerrit-Change-Number: 7438
Gerrit-PatchSet: 8
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-4939, IMPALA-4940: Decimal V2 multiplication

2017-10-02 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7438 )

Change subject: IMPALA-4939, IMPALA-4940: Decimal V2 multiplication
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7438/7/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

http://gerrit.cloudera.org:8080/#/c/7438/7/be/src/runtime/decimal-value.inline.h@249
PS7, Line 249:   } else {
> maybe a quick comment:
Yes, your understanding is correct. Added comment.



--
To view, visit http://gerrit.cloudera.org:8080/7438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1
Gerrit-Change-Number: 7438
Gerrit-PatchSet: 7
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Tue, 03 Oct 2017 02:53:19 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4939, IMPALA-4940: Decimal V2 multiplication

2017-10-02 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#7). ( 
http://gerrit.cloudera.org:8080/7438 )

Change subject: IMPALA-4939, IMPALA-4940: Decimal V2 multiplication
..

IMPALA-4939, IMPALA-4940: Decimal V2 multiplication

Implement the new DECIMAL return type rules for multiply expressions,
active when query option DECIMAL_V2=1. The algorithm for determining
the type of the result of multiplication is described in the JIRA.

DECIMAL V1:

+---+
| typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) |
+---+
| DECIMAL(38,38)|
+---+

+---+
| typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) |
+---+
| DECIMAL(38,30)|
+---+

DECIMAL V2:

+---+
| typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) |
+---+
| DECIMAL(38,37)|
+---+

+---+
| typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) |
+---+
| DECIMAL(38,6) |
+---+

In this patch, we also fix the early multiplication overflow. We compute
a 256 bit integer intermediate value, which we then attempt to scale down
and round.

Performance:

I ran TPCH 300 and TPCDS 1000 workloads and the performance is almost
identical. For TPCH Q1, there was an improvement from 21 seconds to 16
seconds. I did not see any regressions.

The performance improvement is due to the way we check for overflows
after this patch (by counting the leading zeros instead of dividing).
It can be clealy seen in this query:
  select cast(2.2 as decimal(38, 1)) * cast(2.2 as decimal(38, 1))
  before: 7.85s
  after:  2.03s

I noticed performance regressions in the following cases:
- When we need to convert to a 256 bit integer before multiplying,
  which was introduced in this patch. Whether this happens depends on
  the resulting precision and the value of the inputs. In the following
  extreme case, the intermediate value is converted to a 256 bit integer
  every time.

  select cast(1.1 as decimal(38, 37)) * cast(1.1 as decimal(38, 37))
  before: 14.56s (returns null)
  after:  126.17s

- When we need to scale down the intermediate value. In the following
  query the result is decimal(38,6) after the patch, so the
  intermediate needs to be scaled down.

  select cast(2.2 as decimal(38,1)) * cast(2.2 as decimal(38,19))
  before: 7.25s
  after:  13.06s

These regressions are possible only when the resulting precision is 38
which is not common in typical workloads.

Note: The actual queries that I ran for the benchmark are not exactly as
  above. I constructed tables with millions of rows with those values. I
  ran the queries with DECIMAL_v2=1 option before and after the patch.

Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1
---
M be/src/exprs/expr-test.cc
M be/src/runtime/decimal-value.inline.h
M be/src/util/bit-util.h
M fe/src/main/java/org/apache/impala/analysis/TypesUtil.java
4 files changed, 292 insertions(+), 59 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/7438/7
--
To view, visit http://gerrit.cloudera.org:8080/7438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1
Gerrit-Change-Number: 7438
Gerrit-PatchSet: 7
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-4939, IMPALA-4940: Decimal V2 multiplication

2017-09-26 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7438 )

Change subject: IMPALA-4939, IMPALA-4940: Decimal V2 multiplication
..


Patch Set 6:

I think we should explore your suggestion to try SEE/AVX in a follow on patch. 
(to optimze both multiplication and division if it's possible)


--
To view, visit http://gerrit.cloudera.org:8080/7438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1
Gerrit-Change-Number: 7438
Gerrit-PatchSet: 6
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Wed, 27 Sep 2017 01:24:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4939, IMPALA-4940: Decimal V2 multiplication

2017-09-26 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#6). ( 
http://gerrit.cloudera.org:8080/7438 )

Change subject: IMPALA-4939, IMPALA-4940: Decimal V2 multiplication
..

IMPALA-4939, IMPALA-4940: Decimal V2 multiplication

Implement the new DECIMAL return type rules for multiply expressions,
active when query option DECIMAL_V2=1. The algorithm for determining
the type of the result of multiplication is described in the JIRA.

DECIMAL V1:

+---+
| typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) |
+---+
| DECIMAL(38,38)|
+---+

+---+
| typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) |
+---+
| DECIMAL(38,30)|
+---+

DECIMAL V2:

+---+
| typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) |
+---+
| DECIMAL(38,37)|
+---+

+---+
| typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) |
+---+
| DECIMAL(38,6) |
+---+

In this patch, we also fix the early multiplication overflow. We compute
a 256 bit integer intermediate value, which we then attempt to scale down
and round.

Performance:

I ran TPCH 300 and TPCDS 1000 workloads and the performance is almost
identical. For TPCH Q1, there was an improvement from 21 seconds to 16
seconds. I did not see any regressions.

The performance improvement is due to the way we check for overflows
after this patch (by counting the leading zeros instead of dividing).
It can be clealy seen in this query:
  select cast(2.2 as decimal(38, 1)) * cast(2.2 as decimal(38, 1))
  before: 7.85s
  after:  2.03s

I noticed performance regressions in the following cases:
- When we need to convert to a 256 bit integer before multiplying,
  which was introduced in this patch. Whether this happens depends on
  the resulting precision and the value of the inputs. In the following
  extreme case, the intermediate value is converted to a 256 bit integer
  every time.

  select cast(1.1 as decimal(38, 37)) * cast(1.1 as decimal(38, 37))
  before: 14.56s (returns null)
  after:  126.17s

- When we need to scale down the intermediate value. In the following
  query the result is decimal(38,6) after the patch, so the
  intermediate needs to be scaled down.

  select cast(2.2 as decimal(38,1)) * cast(2.2 as decimal(38,19))
  before: 7.25s
  after:  13.06s

These regressions are possible only when the resulting precision is 38
which is not common in typical workloads.

Note: The actual queries that I ran for the benchmark are not exactly as
  above. I constructed tables with millions of rows with those values. I
  ran the queries with DECIMAL_v2=1 option before and after the patch.

Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1
---
M be/src/exprs/expr-test.cc
M be/src/runtime/decimal-value.inline.h
M be/src/util/bit-util.h
M fe/src/main/java/org/apache/impala/analysis/TypesUtil.java
4 files changed, 289 insertions(+), 59 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/7438/6
--
To view, visit http://gerrit.cloudera.org:8080/7438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1
Gerrit-Change-Number: 7438
Gerrit-PatchSet: 6
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-4939, IMPALA-4940: Decimal V2 multiplication

2017-09-18 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4939, IMPALA-4940: Decimal V2 multiplication
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7438/5/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

Line 203:   RESULT_T result = value / multiplier;
> Another idea: there are a finite number of possible delta_scale values.  It
After trying different approaches, this one produced the best results: 
http://github.mtv.cloudera.com/Taras/Impala/commit/950892a48e555ba24dcca772851d251ae3ecb17f

Before: 66.13s
After: 55.63s

So, a small improvement. Still not that great: without the multiplication 
patch, it took 8.26s


-- 
To view, visit http://gerrit.cloudera.org:8080/7438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4939, IMPALA-4940: Decimal V2 multiplication

2017-09-11 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#5).

Change subject: IMPALA-4939, IMPALA-4940: Decimal V2 multiplication
..

IMPALA-4939, IMPALA-4940: Decimal V2 multiplication

Implement the new DECIMAL return type rules for multiply expressions,
active when query option DECIMAL_V2=1. The algorithm for determining
the type of the result of multiplication is described in the JIRA.

DECIMAL V1:

+---+
| typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) |
+---+
| DECIMAL(38,38)|
+---+

+---+
| typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) |
+---+
| DECIMAL(38,30)|
+---+

DECIMAL V2:

+---+
| typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) |
+---+
| DECIMAL(38,37)|
+---+

+---+
| typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) |
+---+
| DECIMAL(38,6) |
+---+

In this patch, we also fix the early multiplication overflow. We compute
an int256 intermediate value, which we then attempt to scale down and
round to int128.

Benchmarks:

In the following benchmarks, we are selecting from lineitem_big, which
has about 100 times as many rows as the normal lineitem.

Query:
  select
sum(l_quantity * l_tax) +
sum(l_extendedprice * l_discount)
  from lineitem_big;

Before:
  DECIMAL_V2 disabled: 2.24s
  DECIMAL_V2 enabled : 2.14

After:
  DECIMAL_V2 disabled: 2.66s
  DECIMAL_V2 enabled : 2.25s

Query:
  select
sum(l_quantity * l_tax * cast(1 as decimal(38, 0))) +
sum(l_extendedprice * l_discount * cast(1 as decimal(38, 0)))
  from lineitem_big

Before:
  DECIMAL_V2 disabled: 2.34s
  DECIMAL_V2 enabled : 2.36s

After:
  DECIMAL_V2 disabled: 4.25s
  DECIMAL_V2 enabled : 4.15s

Query:
  select
sum(l_quantity * l_tax * cast(1 as decimal(38, 37))) +
sum(l_extendedprice * l_discount * cast(1 as decimal(38, 37)))
  from lineitem_big

Before:
  DECIMAL_V2 disabled: 2.84s
  DECIMAL_V2 enabled : 8.26s

After:
  DECIMAL_V2 disabled: 69.16s
  DECIMAL_V2 enabled : 66.13s

Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1
---
M be/src/exprs/expr-test.cc
M be/src/runtime/decimal-value.inline.h
M be/src/util/bit-util.h
M fe/src/main/java/org/apache/impala/analysis/TypesUtil.java
4 files changed, 282 insertions(+), 60 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/7438/5
-- 
To view, visit http://gerrit.cloudera.org:8080/7438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-4939, IMPALA-4939: Decimal V2 multiplication

2017-09-11 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4939, IMPALA-4939: Decimal V2 multiplication
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7438/4//COMMIT_MSG
Commit Message:

Line 7: IMPALA-4939, IMPALA-4939: Decimal V2 multiplication
> The same JIRA is listed twice.
Done


http://gerrit.cloudera.org:8080/#/c/7438/1/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

Line 301:   // The following is frought with apparent difficulty, as there 
is only 1 bit
> It seems most efficient development-wise if we give this a try now while we
Implemented checking the leading zeros optimization. It doesn't seem to have a 
big impact on the performance of the queries I tried.


-- 
To view, visit http://gerrit.cloudera.org:8080/7438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4939, IMPALA-4940: Decimal V2 multiplication

2017-09-11 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#5).

Change subject: IMPALA-4939, IMPALA-4940: Decimal V2 multiplication
..

IMPALA-4939, IMPALA-4940: Decimal V2 multiplication

Implement the new DECIMAL return type rules for multiply expressions,
active when query option DECIMAL_V2=1. The algorithm for determining
the type of the result of multiplication is described in the JIRA.

DECIMAL V1:

+---+
| typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) |
+---+
| DECIMAL(38,38)|
+---+

+---+
| typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) |
+---+
| DECIMAL(38,30)|
+---+

DECIMAL V2:

+---+
| typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) |
+---+
| DECIMAL(38,37)|
+---+

+---+
| typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) |
+---+
| DECIMAL(38,6) |
+---+

In this patch, we also fix the early multiplication overflow. We compute
an int256 intermediate value, which we then attempt to scale down and
round to int128.

Benchmarks:

In the following benchmarks, we are selecting from lineitem_big, which
has about 100 times as many rows as the normal lineitem.

Query:
  select
sum(l_quantity * l_tax) +
sum(l_extendedprice * l_discount)
  from lineitem_big;

Before:
  DECIMAL_V2 disabled: 2.24s
  DECIMAL_V2 enabled : 2.14

After:
  DECIMAL_V2 disabled: 2.66s
  DECIMAL_V2 enabled : 2.25s

Query:
  select
sum(l_quantity * l_tax * cast(1 as decimal(38, 0))) +
sum(l_extendedprice * l_discount * cast(1 as decimal(38, 0)))
  from lineitem_big

Before:
  DECIMAL_V2 disabled: 2.34s
  DECIMAL_V2 enabled : 2.36s

After:
  DECIMAL_V2 disabled: 4.25s
  DECIMAL_V2 enabled : 4.15s

Query:
  select
sum(l_quantity * l_tax * cast(1 as decimal(38, 37))) +
sum(l_extendedprice * l_discount * cast(1 as decimal(38, 37)))
  from lineitem_big

Before:
  DECIMAL_V2 disabled: 2.84s
  DECIMAL_V2 enabled : 8.26s

After:
  DECIMAL_V2 disabled: 69.16s
  DECIMAL_V2 enabled : 66.13s

Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1
---
M be/src/exprs/expr-test.cc
M be/src/runtime/decimal-value.inline.h
M be/src/util/bit-util.h
M fe/src/main/java/org/apache/impala/analysis/TypesUtil.java
4 files changed, 282 insertions(+), 60 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/7438/5
-- 
To view, visit http://gerrit.cloudera.org:8080/7438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-4939, IMPALA-4939: Decimal V2 multiplication

2017-09-07 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4939, IMPALA-4939: Decimal V2 multiplication
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7438/4//COMMIT_MSG
Commit Message:

Line 75: DECIMAL_V2 disabled: 2.65s
> So we go down this path for multiplies that overflow, or are close to overf
Running perf top showed that most of the time is spent in 
ScaleDownAndRound(). It looks like it's the division (of 
one int256 by another int256) that is taking the most time.

I also noticed that disabling the codegen slows down the query in the BEFORE 
case by ~10x. So the performance difference before my patch and after is huge 
only when codegen is enabled. Otherwise, it's like a 3x difference. (Do you 
think it's most likely due to the scanner being much faster in the codegen 
case?)


-- 
To view, visit http://gerrit.cloudera.org:8080/7438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5776: Write partial tuple to the correct mempool

2017-08-18 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#6).

Change subject: IMPALA-5776: Write partial tuple to the correct mempool
..

IMPALA-5776: Write partial tuple to the correct mempool

In the text scanner, we were writing the partial tuple variable length
data to data_buffer_pool_ mempool which caused strange behavior, such
as incorrect results.

If we are scanning compressed data, the pool gets attached to the row
batch at the end of a GetNext() call and gets freed before the next
GetNext() call. This is wrong because we expect the data in the partial
tuple to survive between the GetNext() calls. If we are scanning non
compressed data, data_buffer_pool_ never gets cleared and grows over
time until the scanner finishes reading the scan range.

We fix the problem by writing the varlen partial tuple data to
boundary_pool, which is where the constant length partial tuple data is
written. We also make sure that boundary pool does not hold any tuple
data of returned batches by always deep copying it to output batches.

Testing:
- Ran some tests locally on ASAN build.
- Updated test_scanners_fuzz.py to make slightly more significant
  changes to the data files. This change was helpful for finding
  issues while developing this patch.

Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540
---
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-scanner.h
M tests/query_test/test_scanners_fuzz.py
3 files changed, 65 insertions(+), 51 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/7639/6
-- 
To view, visit http://gerrit.cloudera.org:8080/7639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5776: Write partial tuple to the correct mempool

2017-08-18 Thread Taras Bobrovytsky (Code Review)
Hello Alex Behm, Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7639

to look at the new patch set (#6).

Change subject: IMPALA-5776: Write partial tuple to the correct mempool
..

IMPALA-5776: Write partial tuple to the correct mempool

In the text scanner, we were writing the partial tuple variable length
data to data_buffer_pool_ mempool which caused strange behavior, such
as incorrect results.

If we are scanning compressed data, the pool gets attached to the row
batch at the end of a GetNext() call and gets freed before the next
GetNext() call. This is wrong because we expect the data in the partial
tuple to survive between the GetNext() calls. If we are scanning non
compressed data, data_buffer_pool_ never gets cleared and grows over
time until the scanner finishes reading the scan range.

We fix the problem by writing the varlen partial tuple data to
boundary_pool, which is where the constant length partial tuple data is
written. We also make sure that boundary pool does not hold any tuple
data of returned batches by always deep copying it to output batches.

Testing:
- Ran some tests locally on ASAN build.
- Updated test_scanners_fuzz.py to make slightly more significant
  changes to the data files. This change was helpful for finding
  issues while developing this patch.

Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540
---
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-scanner.h
M tests/query_test/test_scanners_fuzz.py
3 files changed, 65 insertions(+), 51 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/7639/6
-- 
To view, visit http://gerrit.cloudera.org:8080/7639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] MPALA-5776: Write partial tuple to the correct mempool

2017-08-17 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: MPALA-5776: Write partial tuple to the correct mempool
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7639/5/be/src/exec/hdfs-text-scanner.cc
File be/src/exec/hdfs-text-scanner.cc:

Line 297: boundary_column_.Clear();
> Can we use the existing CopyBoundaryField() function for consistency? Other
I tried it and it doesn't quite work (without clearing the boundary field)

This is a slightly different case. CopyBoundaryField() copies the contents of 
the field followed by the boundary column to memory allocated from the row 
batch.

In this case, after FillColumns(), field_locations is pointing at the data in 
the boundary column. We can trick CopyBoundaryField() by clearing the boundary 
column right before CopyBoundaryField(), but it's such a weird thing to do. (It 
looks weird because we just cleared the boundary column, why are we trying to 
copy out of it?)

Another thing we can do, is check that the field is pointing to a different 
memory location than the boundary column, but I also think that's weird.

I suggest we leave it as is (for me, the way it is now is the least confusing 
way). What do you think?


-- 
To view, visit http://gerrit.cloudera.org:8080/7639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] MPALA-5776: Write partial tuple to the correct mempool

2017-08-17 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: MPALA-5776: Write partial tuple to the correct mempool
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7639/5//COMMIT_MSG
Commit Message:

PS5, Line 7: MPALA
Oops.


-- 
To view, visit http://gerrit.cloudera.org:8080/7639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] MPALA-5776: Write partial tuple to the correct mempool

2017-08-16 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#5).

Change subject: MPALA-5776: Write partial tuple to the correct mempool
..

MPALA-5776: Write partial tuple to the correct mempool

In the text scanner, we were writing the partial tuple variable length
data to data_buffer_pool_ mempool which caused strange behavior, such
as incorrect results.

If we are scanning compressed data, the pool gets attached to the row
batch at the end of a GetNext() call and gets freed before the next
GetNext() call. This is wrong because we expect the data in the partial
tuple to survive between the GetNext() calls. If we are scanning non
compressed data, data_buffer_pool_ never gets cleared and grows over
time until the scanner finishes reading the scan range.

We fix the problem by writing the varlen partial tuple data to
boundary_pool_, which is where the constant length partial tuple data is
written. We also make sure that boundary pool does not hold any tuple
data of returned batches by always deep copying it to output batches.

Testing:
- Ran some tests locally on ASAN build.
- Updated test_scanners_fuzz.py to make slightly more significant
  changes to the data files. This change was helpful for finding
  issues while developing this patch.

Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540
---
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-scanner.h
M tests/query_test/test_scanners_fuzz.py
3 files changed, 65 insertions(+), 51 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/7639/5
-- 
To view, visit http://gerrit.cloudera.org:8080/7639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] MPALA-5776: Write partial tuple to the correct mempool

2017-08-16 Thread Taras Bobrovytsky (Code Review)
Hello Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7639

to look at the new patch set (#5).

Change subject: MPALA-5776: Write partial tuple to the correct mempool
..

MPALA-5776: Write partial tuple to the correct mempool

In the text scanner, we were writing the partial tuple variable length
data to data_buffer_pool_ mempool which caused strange behavior, such
as incorrect results.

If we are scanning compressed data, the pool gets attached to the row
batch at the end of a GetNext() call and gets freed before the next
GetNext() call. This is wrong because we expect the data in the partial
tuple to survive between the GetNext() calls. If we are scanning non
compressed data, data_buffer_pool_ never gets cleared and grows over
time until the scanner finishes reading the scan range.

We fix the problem by writing the varlen partial tuple data to
boundary_pool_, which is where the constant length partial tuple data is
written. We also make sure that boundary pool does not hold any tuple
data of returned batches by always deep copying it to output batches.

Testing:
- Ran some tests locally on ASAN build.
- Updated test_scanners_fuzz.py to make slightly more significant
  changes to the data files. This change was helpful for finding
  issues while developing this patch.

Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540
---
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-scanner.h
M tests/query_test/test_scanners_fuzz.py
3 files changed, 65 insertions(+), 51 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/7639/5
-- 
To view, visit http://gerrit.cloudera.org:8080/7639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5776: Write partial tuple to the correct mempool

2017-08-16 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-5776: Write partial tuple to the correct mempool
..


Patch Set 4:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/7639/2/be/src/exec/hdfs-text-scanner.cc
File be/src/exec/hdfs-text-scanner.cc:

Line 808:   // The partial tuple pool is cleared after the data has been 
copied out of it to
> Another idea: Add a separate function for copying the partial tuple into th
Created a function for copying the partial tuple into output batch.


http://gerrit.cloudera.org:8080/#/c/7639/4/be/src/exec/hdfs-text-scanner.cc
File be/src/exec/hdfs-text-scanner.cc:

PS4, Line 317: partial_tuple_empty_
> Comment needs update
Done


http://gerrit.cloudera.org:8080/#/c/7639/4/be/src/exec/hdfs-text-scanner.h
File be/src/exec/hdfs-text-scanner.h:

Line 194:   boost::scoped_ptr boundary_pool_;
> So my question is really whether there's a good reason why the lifetime nee
Fixed the place where we forget to copy the data out. Removed the new memory 
pool that I added.


Line 199:   StringBuffer boundary_row_;
> Can you clarify whether this data will be referenced by the output batches.
Done


Line 202:   StringBuffer boundary_column_;
> Same here. It looks like some attempt is made to copy data out of this colu
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5776: Write partial tuple to the correct mempool

2017-08-15 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-5776: Write partial tuple to the correct mempool
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7639/4/be/src/exec/hdfs-text-scanner.h
File be/src/exec/hdfs-text-scanner.h:

Line 194:   boost::scoped_ptr boundary_pool_;
> I still don't understand the memory lifetime for boundary_row_, boundary_co
Data is not always copied out of boundary_col. It can end up here: 
http://github.mtv.cloudera.com/CDH/Impala/blob/e5e444a89008a22f987c517ef1ecaa9f1693b060/be/src/exec/text-converter.inline.h#L71

This happens if reuse_data is true, which it always is if codegen is enabled. 
So it's not safe to free the boundary pool. So the life time of the boundary 
pool is different than the partial tuple pool.

Maybe this can be addressed in a different patch?


http://gerrit.cloudera.org:8080/#/c/7639/4/tests/query_test/test_scanners_fuzz.py
File tests/query_test/test_scanners_fuzz.py:

Line 217: num_corruptions = rng.randint(0, int(math.log(len(data
> Do these changes reproduce the bug?
No, I don't think they reproduce the bug, but I think these changes are good to 
make in general.


-- 
To view, visit http://gerrit.cloudera.org:8080/7639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5776: Write partial tuple to the correct mempool

2017-08-14 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#4).

Change subject: IMPALA-5776: Write partial tuple to the correct mempool
..

IMPALA-5776: Write partial tuple to the correct mempool

In the text scanner, we were writing the partial tuple variable length
data to data_buffer_pool_ mempool which caused strange behavior, such
as incorrect results.

If we are scanning compressed data, the pool gets attached to the row
batch at the end of a GetNext() call and gets freed before the next
GetNext() call. This is wrong because we expect the data in the partial
tuple to survive between the GetNext() calls. If we are scanning non
compressed data, data_buffer_pool_ never gets cleared and grows over
time until the scanner finishes reading the scan range.

We fix the problem by creating a new memory pool for the partial tuple
constant and variable length data. The new memory pool does not hold
tuple data of returned batches because the data is always deep-copied
into the output batch.

Testing:
- Updated test_scanners_fuzz.py to make slightly more significant
  changes to the data files.
- Ran scanner tests locally on ASAN build.
- No new tests were added, because it is difficult to construct test
  cases due to the issue being non-deterministic.

Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540
---
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-scanner.h
M tests/query_test/test_scanners_fuzz.py
3 files changed, 46 insertions(+), 40 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/7639/4
-- 
To view, visit http://gerrit.cloudera.org:8080/7639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5776: Write partial tuple to the correct mempool

2017-08-14 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#4).

Change subject: IMPALA-5776: Write partial tuple to the correct mempool
..

IMPALA-5776: Write partial tuple to the correct mempool

In the text scanner, we were writing the partial tuple variable length
data to data_buffer_pool_ mempool which caused strange behavior, such
as incorrect results.

If we are scanning compressed data, the pool gets attached to the row
batch at the end of a GetNext() call and gets freed before the next
GetNext() call. This is wrong because we expect the data in the partial
tuple to survive between the GetNext() calls. If we are scanning non
compressed data, data_buffer_pool_ never gets cleared and grows over
time until the scanner finishes reading the scan range.

We fix the problem by creating a new memory pool for the partial tuple
constant and variable length data. The new memory pool does not hold
tuple data of returned batches because the data is always deep-copied
into the output batch.

Testing:
- Updated test_scanners_fuzz.py to make slightly more significant
  changes to the data files.
- Ran scanner tests locally on ASAN build.
- No new tests were added, because it is difficult to construct test
  cases due to the issue being non-deterministic.

Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540
---
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-scanner.h
M tests/query_test/test_scanners_fuzz.py
3 files changed, 46 insertions(+), 40 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/7639/4
-- 
To view, visit http://gerrit.cloudera.org:8080/7639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5776: Write partial tuple to the correct mempool

2017-08-14 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-5776: Write partial tuple to the correct mempool
..


Patch Set 3:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/7639/3/be/src/exec/hdfs-text-scanner.cc
File be/src/exec/hdfs-text-scanner.cc:

Line 804:   // Copy over all materialized slots in the partial tuple.
> Only describes what the code does. How about something like:
Done


Line 808:   // prevent the accumulation of variable length data in it. We 
also recreate the
> The partial tuple isn't re-created here
Done


http://gerrit.cloudera.org:8080/#/c/7639/3/be/src/exec/hdfs-text-scanner.h
File be/src/exec/hdfs-text-scanner.h:

Line 186:   /// Utility function to write out 'num_fields' to 'tuple_'.  This 
is used to parse
> Comment is weird, I suggest rephrasing to something like:
Done


Line 227:   /// Mem pool for the partial tuple.
> Mention that it does not hold tuple data of returned batches because the da
Done


Line 230:   /// Memory to store partial tuples split across buffers.  Memory 
comes from
> Mention that this tuple is always deep copied into the output batch
Done


Line 231:   /// partial_tuple_pool_.  There is only one tuple allocated for 
this object and reused
> The 2nd sentence does not seem to apply anymore because we realloc the part
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5776: Write partial tuple to the correct mempool

2017-08-14 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#4).

Change subject: IMPALA-5776: Write partial tuple to the correct mempool
..

IMPALA-5776: Write partial tuple to the correct mempool

In the text scanner, we were writing the partial tuple variable length
data to data_buffer_pool_ mempool which caused strange behavior, such
as incorrect results.

If we are scanning compressed data, the pool gets attached to the row
batch at the end of a GetNext() call and gets freed before the next
GetNext() call. This is wrong because we expect the data in the partial
tuple to survive between the GetNext() calls. If we are scanning non
compressed data, data_buffer_pool_ never gets cleared and grows over
time until the scanner finishes reading the scan range.

We fix the problem by creating a new memory pool for the partial tuple
will contains partial tuple contant length and variable length data.

Testing:
- Updated test_scanners_fuzz.py to make slightly more significant
  changes to the data files.
- Ran scanner tests locally on ASAN build.
- No new tests were added, because it is difficult to construct test
  cases due to the issue being non-deterministic.

Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540
---
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-scanner.h
M tests/query_test/test_scanners_fuzz.py
3 files changed, 46 insertions(+), 40 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/7639/4
-- 
To view, visit http://gerrit.cloudera.org:8080/7639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5776: Write partial tuple to the correct mempool

2017-08-11 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#3).

Change subject: IMPALA-5776: Write partial tuple to the correct mempool
..

IMPALA-5776: Write partial tuple to the correct mempool

In the text scanner, we were writing the partial tuple variable length
data to data_buffer_pool_ mempool which caused strange behavior, such
as incorrect results.

If we are scanning compressed data, the pool gets attached to the row
batch at the end of a GetNext() call and gets freed before the next
GetNext() call. This is wrong because we expect the data in the partial
tuple to survive between the GetNext() calls. If we are scanning non
compressed data, data_buffer_pool_ never gets cleared and grows over
time until the scanner finishes reading the scan range.

We fix the problem by creating a new memory pool for the partial tuple
will contains partial tuple contant length and variable length data.

Testing:
- Updated test_scanners_fuzz.py to make slightly more significant
  changes to the data files.
- Ran scanner tests locally on ASAN build.
- No new tests were added, because it is difficult to construct test
  cases due to the issue being non-deterministic.

Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540
---
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-scanner.h
M tests/query_test/test_scanners_fuzz.py
3 files changed, 42 insertions(+), 37 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/7639/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5776: Write partial tuple to the correct mempool

2017-08-11 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#3).

Change subject: IMPALA-5776: Write partial tuple to the correct mempool
..

IMPALA-5776: Write partial tuple to the correct mempool

In the text scanner, we were writing the partial tuple variable length
data to data_buffer_pool_ mempool which caused strange behavior, such
as incorrect results.

If we are scanning compressed data, the pool gets attached to the row
batch at the end of a GetNext() call and gets freed before the next
GetNext() call. This is wrong because we expect the data in the partial
tuple to survive between the GetNext() calls. If we are scanning non
compressed data, data_buffer_pool_ never gets cleared and grows over
time until the scanner finishes reading the scan range.

We fix the problem by creating a new memory pool for the partial tuple
will contains partial tuple contant length and variable length data.

Testing:
- Updated test_scanners_fuzz.py to make slightly more significant
  changes to the data files.
- Ran scanner tests locally on ASAN build.
- No new tests were added, because it is difficult to construct test
  cases due to the issue being non-deterministic.

Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540
---
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-scanner.h
M tests/query_test/test_scanners_fuzz.py
3 files changed, 42 insertions(+), 37 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/7639/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5776: Write partial tuple to the correct mempool

2017-08-11 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#3).

Change subject: IMPALA-5776: Write partial tuple to the correct mempool
..

IMPALA-5776: Write partial tuple to the correct mempool

In the text scanner, we were writing the partial tuple variable length
data to data_buffer_pool_ mempool which caused strange behavior, such
as incorrect results.

If we are scanning compressed data, the pool gets attached to the row
batch at the end of a GetNext() call and gets freed before the next
GetNext() call. This is wrong because we expect the data in the partial
tuple to survive between the GetNext() calls. If we are scanning non
compressed data, data_buffer_pool_ never gets cleared and grows over
time until the scanner finishes reading the scan range.

We fix the problem by creating a new memory pool for the partial tuple
will contains partial tuple contant length and variable length data.

Testing:
- Updated test_scanners_fuzz.py to make slightly more significant
  changes to the data files.
- Ran scanner tests locally on ASAN build.
- No new tests were added, because it is difficult to construct test
  cases due to the issue being non-deterministic.

Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540
---
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-scanner.h
M tests/query_test/test_scanners_fuzz.py
3 files changed, 43 insertions(+), 37 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/7639/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] MPALA-5776: Write partial tuple to the correct mempool

2017-08-11 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: MPALA-5776: Write partial tuple to the correct mempool
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7639/2/be/src/exec/hdfs-text-scanner.cc
File be/src/exec/hdfs-text-scanner.cc:

Line 169: row_batch->tuple_data_pool()->AcquireData(boundary_pool_.get(), 
false);
> I don't think that data from the boundary pool is ever used outside of this
Actually, after running some tests with ASAN, it turns out that it's not ok to 
clear the boundary pool. The boundary col contains some var len data that is 
used after Close() is called.

Also, what do you mean we should be transferring the contents of boundary pool 
below instead of clearing it? Do you mean in the else clause? If row batch is 
null, where would would we transfer the contents to?


-- 
To view, visit http://gerrit.cloudera.org:8080/7639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] MPALA-5776: Write partial tuple to the correct mempool

2017-08-11 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: MPALA-5776: Write partial tuple to the correct mempool
..


Patch Set 2:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/7639/2//COMMIT_MSG
Commit Message:

Line 7: MPALA-5776: Write partial tuple to the correct mempool
> Missing I.
Done


http://gerrit.cloudera.org:8080/#/c/7639/2/be/src/exec/hdfs-text-scanner.cc
File be/src/exec/hdfs-text-scanner.cc:

Line 169: row_batch->tuple_data_pool()->AcquireData(boundary_pool_.get(), 
false);
> Does this even make sense? are there any cases where it's valid for the row
I don't think that data from the boundary pool is ever used outside of this 
class, so it makes sense to clear it.

I analyzed the code, and it doesn't look to me like data in boundary pool is 
ever used, but how can we be 100% sure?


Line 269: bool eosr = true;
> Remove. This not needed/used.
It's used on line 277. Are you suggesting to remove it from the signature of 
FillByteBuffer as well?


Line 299: boundary_column_.Clear();
> What's the point of clearing the boundary column here? I don't think the sc
Removed this.


PS2, Line 765: batches
> I'm not sure that I understand the reference to batches. Do they mean block
Yeah, "batches" doesn't make sense here. Changed it to blocks.


Line 806:   partial_tuple_->DeepCopy(tuple_, *scan_node_->tuple_desc(), 
pool);
> Can you add a column clarifying what the intent is. E.g. "Copy over all mat
Done


Line 808:   boundary_row_.Reset();
> Another idea: Add a separate function for copying the partial tuple into th
I added a new mem pool, so boundary row and columns are not affected any more. 
Alex, do you guys think it's still worth creating a separate function?


PS2, Line 810: emptied
> cleared? The comment makes me thing that all the memory has been freed, but
Done


Line 814:   partial_tuple_ = Tuple::Create(tuple_byte_size_, 
boundary_pool_.get());
> Is it possible to just initialize partial_tuple_ when it's needed? I.e. bef
Done


Line 896: int num_fields, bool copy_strings) {
> Can you remove the 'copy_strings' parameter? It's not used in this function
Done


http://gerrit.cloudera.org:8080/#/c/7639/2/be/src/exec/hdfs-text-scanner.h
File be/src/exec/hdfs-text-scanner.h:

Line 188:   /// the boundary pool.
> Hah! If only the code had matched the comment...
totally agree


Line 194:   /// Mem pool for boundary_row_ and boundary_column_.
> Needs updating since it also backs partial_tuple_.
I made changes in the next patch so that this comment is now true.


-- 
To view, visit http://gerrit.cloudera.org:8080/7639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5776: Write partial tuple to the correct mempool

2017-08-11 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#3).

Change subject: IMPALA-5776: Write partial tuple to the correct mempool
..

IMPALA-5776: Write partial tuple to the correct mempool

In the text scanner, we were writing the partial tuple variable length
data to data_buffer_pool_ mempool which caused strange behavior, such
as incorrect results.

If we are scanning compressed data, the pool gets attached to the row
batch at the end of a GetNext() call and gets freed before the next
GetNext() call. This is wrong because we expect the data in the partial
tuple to survive between the GetNext() calls. If we are scanning non
compressed data, data_buffer_pool_ never gets cleared and grows over
time until the scanner finishes reading the scan range.

We fix the problem by creating a new memory pool for the partial tuple
will contains partial tuple contant length and variable length data.

Testing:
- Ran some tests locally on ASAN build.
- No new tests were added, because it is difficult to construct test
  cases due to the issue being non-deterministic.

Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540
---
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-scanner.h
2 files changed, 29 insertions(+), 27 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/7639/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5776: Write partial tuple to the correct mempool

2017-08-11 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#3).

Change subject: IMPALA-5776: Write partial tuple to the correct mempool
..

IMPALA-5776: Write partial tuple to the correct mempool

In the text scanner, we were writing the partial tuple variable length
data to data_buffer_pool_ mempool which caused strange behavior, such
as incorrect results.

If we are scanning compressed data, the pool gets attached to the row
batch at the end of a GetNext() call and gets freed before the next
GetNext() call. This is wrong because we expect the data in the partial
tuple to survive between the GetNext() calls. If we are scanning non
compressed data, data_buffer_pool_ never gets cleared and grows over
time until the scanner finishes reading the scan range.

We fix the problem by creating a new memory pool for the partial tuple
will contains partial tuple contant length and variable length data.

Testing:
- Ran some tests locally on ASAN build.
- No new tests were added, because it is difficult to construct test
  cases due to the issue being non-deterministic.

Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540
---
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-scanner.h
2 files changed, 29 insertions(+), 27 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/7639/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-2422: Fix escaping in LIKE clause

2017-08-11 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#2).

Change subject: IMPALA-2422: Fix escaping in LIKE clause
..

IMPALA-2422: Fix escaping in LIKE clause

There are two stages to processing a like clause. First, we determine if
it is possible to convert the expression to a simpler function, such as
StartsWith() or EndsWith(). If not, then we use a Regex libarary to
compute the result.

There was a problem in the logic that determines if it is possible to
use a simpler function. It did not take into account escape characters.
For example, "abc\%" was incorrectly converted to StartsWith("abc\").

Testing:
-Added expr tests.

Change-Id: Iec0f5d99c855ea8d4fe1bcf28c05780ba315034b
---
M be/src/exprs/expr-test.cc
M be/src/exprs/like-predicate.cc
2 files changed, 28 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/7660/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iec0f5d99c855ea8d4fe1bcf28c05780ba315034b
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 


[Impala-ASF-CR] IMPALA-2422: Fix escaping in LIKE clause

2017-08-11 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#2).

Change subject: IMPALA-2422: Fix escaping in LIKE clause
..

IMPALA-2422: Fix escaping in LIKE clause

There are two stages to processing a like clause. First, we determine if
it is possible to convert the expression to a simpler function, such as
StartsWith() or EndsWith(). If not, then we use a Regex libarary to
compute the result.

There was a problem in the logic that determines if it is possible to
use a simpler function. It did not take into account escape characters.
For example, "abc\%" was incorrectly converted to StartsWith("abc\").

Testing:
-Added expr tests.

Change-Id: Iec0f5d99c855ea8d4fe1bcf28c05780ba315034b
---
M be/src/exprs/expr-test.cc
M be/src/exprs/like-predicate.cc
2 files changed, 28 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/7660/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iec0f5d99c855ea8d4fe1bcf28c05780ba315034b
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 


[Impala-ASF-CR] IMPALA-5681: release reservation from blocking operators

2017-08-11 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-5681: release reservation from blocking operators
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7619/4/be/src/exec/partitioned-aggregation-node.cc
File be/src/exec/partitioned-aggregation-node.cc:

PS4, Line 1203:  
put a comma here


-- 
To view, visit http://gerrit.cloudera.org:8080/7619
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6f4d0ad127d5fcd14b9821a7c127eec11d98692f
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2422: Fix escaping in LIKE clause

2017-08-11 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7660

Change subject: IMPALA-2422: Fix escaping in LIKE clause
..

IMPALA-2422: Fix escaping in LIKE clause

There are two stages to processing a like clause. First, we determine if
it is possible to convert the expression to a simpler function, such as
StartsWith() or EndsWith(). If not, then we use a Regex libarary to
compute the result.

There was a problem in the logic that determines if it is possible to
use a simpler function. It did not take into account escape characters.
For example, "abc\%" was incorrectly converted to StartsWith("abc\").

Testing:
-Added expr tests.

Change-Id: Iec0f5d99c855ea8d4fe1bcf28c05780ba315034b
---
M be/src/exprs/expr-test.cc
M be/src/exprs/like-predicate.cc
2 files changed, 28 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/7660/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Iec0f5d99c855ea8d4fe1bcf28c05780ba315034b
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 


[Impala-ASF-CR] MPALA-5776: Write partial tuple to the correct mempool

2017-08-09 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#2).

Change subject: MPALA-5776: Write partial tuple to the correct mempool
..

MPALA-5776: Write partial tuple to the correct mempool

In the text scanner, we were writing the partial tuple variable length
data to data_buffer_pool_ mempool which caused strange behavior, such
as incorrect results.

If we are scanning compressed data, the pool gets attached to the row
batch at the end of a GetNext() call and gets freed before the next
GetNext() call. This is wrong because we expect the data in the partial
tuple to survive between the GetNext() calls. If we are scanning non
compressed data, data_buffer_pool_ never gets cleared and grows over
time until the scanner finishes reading the scan range.

We fix the problem by writing the varlen partial tuple data to
boundary_pool_, which is where the constant length partial tuple data is
written.

Testing:
- Ran some tests locally on ASAN build.
- No new tests were added, because it is difficult to construct test
  cases due to the issue being non-deterministic.

Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540
---
M be/src/exec/hdfs-text-scanner.cc
1 file changed, 13 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/7639/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] MPALA-5776: Write partial tuple to the correct mempool

2017-08-09 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#2).

Change subject: MPALA-5776: Write partial tuple to the correct mempool
..

MPALA-5776: Write partial tuple to the correct mempool

In the text scanner, we were writing the partial tuple variable length
data to data_buffer_pool_ mempool which caused strange behavior, such
as incorrect results.

If we are scanning compressed data, the pool gets attached to the row
batch at the end of a GetNext() call and gets freed before the next
GetNext() call. This is wrong because we expect the data in the partial
tuple to survive between the GetNext() calls. If we are scanning non
compressed data, data_buffer_pool_ never gets cleared and grows over
time until the scanner finishes reading the scan range.

We fix the problem by writing the varlen partial tuple data to
boundary_pool_, which is where the constant length partial tuple data is
written.

Testing:
- Ran some tests locally on ASAN build.
- No new tests were added, because it is difficult to construct test
  cases due to the issue being non-deterministic.

Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540
---
M be/src/exec/hdfs-text-scanner.cc
1 file changed, 13 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/7639/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] MPALA-5776: Write partial tuple to the correct mempool

2017-08-09 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#2).

Change subject: MPALA-5776: Write partial tuple to the correct mempool
..

MPALA-5776: Write partial tuple to the correct mempool

In the text scanner, we were writing the partial tuple variable length
data to data_buffer_pool_ mempool which caused strange behavior, such
as incorrect results.

If we are scanning compressed data, the pool gets attached to the row
batch at the end of a GetNext() call and gets freed before the next
GetNext() call. This is wrong because we expect the data in the partial
tuple to survive between the GetNext() calls. If we are scanning non
compressed data, data_buffer_pool_ never gets cleared and grows over
time until the scanner finishes reading the scan range.

We fix the problem by writing the varlen partial tuple data to
boundary_pool_, which is where the constant length partial tuple data is
written.

Testing:
- Ran some tests locally on ASAN build.
- No new tests were added, because it is difficult to construct test
  cases due to the issue being non-deterministic.

Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540
---
M be/src/exec/hdfs-text-scanner.cc
1 file changed, 13 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/7639/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] MPALA-5776: Write partial tuple to the correct mempool

2017-08-09 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#2).

Change subject: MPALA-5776: Write partial tuple to the correct mempool
..

MPALA-5776: Write partial tuple to the correct mempool

In the text scanner, we were writing the partial tuple variable length
data to data_buffer_pool_ mempool which caused strange behavior, such
as incorrect results.

If we are scanning compressed data, the pool gets attached to the row
batch at the end of a GetNext() call and gets freed before the next
GetNext() call. This is wrong because we expect the data in the partial
tuple to survive between the GetNext() calls. If we are scanning non
compressed data, data_buffer_pool_ never gets cleared and grows over
time until the scanner finishes reading the scan range.

We fix the problem by writing the varlen partial tuple data to
boundary_pool_, which is where the constant length partial tuple data is
written.

Testing:
- Ran some tests locally on ASAN build.
- No new tests were added, because it is difficult to construct test
  cases due to the issue being non-deterministic.

Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540
---
M be/src/exec/hdfs-text-scanner.cc
1 file changed, 14 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/7639/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] MPALA-5776: Write partial tuple to the correct mempool

2017-08-09 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7639

Change subject: MPALA-5776: Write partial tuple to the correct mempool
..

MPALA-5776: Write partial tuple to the correct mempool

In the text scanner, we were writing the partial tuple variable length
data to data_buffer_pool_ mempool which caused strange behavior, such
as incorrect results.

If we are scanning compressed data, the pool gets attached to the row
batch at the end of a GetNext() call and gets freed before the next
GetNext() call. This is wrong because we expect the data in the partial
tuple to survive between the GetNext() calls. If we are scanning non
compressed data, data_buffer_pool_ never gets cleared and grows over
time until the scanner finishes reading the scan range.

We fix the problem by writing the varlen partial tuple data to
boundary_pool_, which is where the constant length partial tuple data is
written.

Testing:
- Ran some tests locally on ASAN build.
- No new tests were added, because it is difficult to construct test
  cases due to the issue being non-deterministic.

Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540
---
M be/src/exec/hdfs-text-scanner.cc
1 file changed, 11 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/7639/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-4939, IMPALA-4939: Decimal V2 multiplication

2017-08-03 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4939, IMPALA-4939: Decimal V2 multiplication
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7438/4//COMMIT_MSG
Commit Message:

Line 75: DECIMAL_V2 disabled: 2.65s
> This is scary and I don't think we should consider merging this until it's 
It takes time to convert a number from 128 bits to 256 bits. Also, Boost does 
something like 4 128 bit multiplications internally in the 256 bit 
multiplication function (so it makes sense for it to be at least 4x slower). I 
think the 256 bit multiplication function is also not optimized for the worst 
case (where the result actually needs more than 128 bits).

Also, it takes time to convert 2 numbers to 256 bit numbers and then convert 1 
number back.

I don't think it's that surprising that it's much slower. 256 bit 
multiplication is slow and is not implemented in hardware like others.

Remember, this benchmark is the absolute worst case scenario, where we do a 256 
bit multiplication for every row.


-- 
To view, visit http://gerrit.cloudera.org:8080/7438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5722: Fix string to decimal cast

2017-07-28 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#2).

Change subject: IMPALA-5722: Fix string to decimal cast
..

IMPALA-5722: Fix string to decimal cast

When converting a string to a decimal, we didn't handle the case where
the exponent is a large negative number. The function for computing the
divisor returns -1 when the exponent is too large. The problem is fixed
by making sure that the divisor is positive.

Testing:
-Added decimal tests.

Change-Id: I1f5eb5b64a6924af2e8eb7f9a50da67015757efe
---
M be/src/runtime/decimal-test.cc
M be/src/util/decimal-util.h
M be/src/util/string-parser.h
3 files changed, 31 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/7517/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7517
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1f5eb5b64a6924af2e8eb7f9a50da67015757efe
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 


[Impala-ASF-CR] IMPALA-4939, IMPALA-4939: Decimal V2 multiplication

2017-07-28 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#4).

Change subject: IMPALA-4939, IMPALA-4939: Decimal V2 multiplication
..

IMPALA-4939, IMPALA-4939: Decimal V2 multiplication

Implement the new DECIMAL return type rules for multiply expressions,
active when query option DECIMAL_V2=1. The algorithm for determining
the type of the result of multiplication is described in the JIRA.

DECIMAL V1:

+---+
| typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) |
+---+
| DECIMAL(38,38)|
+---+

+---+
| typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) |
+---+
| DECIMAL(38,30)|
+---+

DECIMAL V2:

+---+
| typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) |
+---+
| DECIMAL(38,37)|
+---+

+---+
| typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) |
+---+
| DECIMAL(38,6) |
+---+

In this patch, we also fix the early multiplication overflow. We compute
an int256 intermediate value, which we then attempt to scale down and
round to int128.

Benchmarks:

Query:
select
  sum(l_quantity * l_tax) +
  sum(l_extendedprice * l_discount)
from lineitem_big;

Before:
DECIMAL_V2 disabled: 6.68s
DECIMAL_V2 enabled : 6.77s

After:
DECIMAL_V2 disabled: 6.66s
DECIMAL_V2 enabled : 6.58s

In the above benchmark, we are selecting from lineitem_big, which has
about 100 times as many rows as the normal lineitem.

Query:
select
  sum(l_quantity * l_tax * cast(1 as decimal(38, 37))) +
  sum(l_extendedprice * l_discount * cast(1 as decimal(38, 37)))
from lineitem

Before:
DECIMAL_V2 disabled: 0.23s
DECIMAL_V2 enabled : 0.45s

After:
DECIMAL_V2 disabled: 2.65s
DECIMAL_V2 enabled : 2.54s

The query is slower after the patch because the intermediate result is
int256.

Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1
---
M be/src/exprs/expr-test.cc
M be/src/runtime/decimal-value.inline.h
M be/src/util/bit-util.h
M fe/src/main/java/org/apache/impala/analysis/TypesUtil.java
4 files changed, 253 insertions(+), 58 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/7438/4
-- 
To view, visit http://gerrit.cloudera.org:8080/7438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-4939, IMPALA-4939: Decimal V2 multiplication

2017-07-28 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4939, IMPALA-4939: Decimal V2 multiplication
..


Patch Set 3:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/7438/1/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

Line 301:   // doubling such a value would overflow in two's complement.  
However, we
> A while loop will perform terribly, but there are intrinsics that can be us
Added a TODO.


http://gerrit.cloudera.org:8080/#/c/7438/1/be/src/util/bit-util.h
File be/src/util/bit-util.h:

Line 65: return value < 0 ? -1 : 1;
> I think we want to think carefully about using the Sign function with int25
I included the benchmark result in the commit message.


http://gerrit.cloudera.org:8080/#/c/7438/1/be/src/util/decimal-util.h
File be/src/util/decimal-util.h:

Line 336
> Really?  That's crazy.
I removed this function so that the function on line 58 would get called.


http://gerrit.cloudera.org:8080/#/c/7438/1/fe/src/main/java/org/apache/impala/analysis/TypesUtil.java
File fe/src/main/java/org/apache/impala/analysis/TypesUtil.java:

Line 260: resultPrecision = p1 + p2 + 1;
> I think the 0.999 * 0.999 case is a fluke - we are sacrificing precision fo
This is more of a product question. I'll ask Greg about this later.


-- 
To view, visit http://gerrit.cloudera.org:8080/7438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4939, IMPALA-4939: Decimal V2 multiplication

2017-07-28 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#4).

Change subject: IMPALA-4939, IMPALA-4939: Decimal V2 multiplication
..

IMPALA-4939, IMPALA-4939: Decimal V2 multiplication

Implement the new DECIMAL return type rules for multiply expressions,
active when query option DECIMAL_V2=1. The algorithm for determining
the type of the result of multiplication is described in the JIRA.

DECIMAL V1:

+---+
| typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) |
+---+
| DECIMAL(38,38)|
+---+

+---+
| typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) |
+---+
| DECIMAL(38,30)|
+---+

DECIMAL V2:

+---+
| typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) |
+---+
| DECIMAL(38,37)|
+---+

+---+
| typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) |
+---+
| DECIMAL(38,6) |
+---+

In this patch, we also fix the early multiplication overflow. We compute
an int256 intermediate value, which we then attempt to scale down and
round to int128.

Benchmarks:

Query:
select
  sum(l_quantity * l_tax) +
  sum(l_extendedprice * l_discount)
from lineitem_big;

Before:
DECIMAL_V2 disabled: 6.68s
DECIMAL_V2 enabled : 6.77s

After:
DECIMAL_V2 disabled: 6.66s
DECIMAL_V2 enabled : 6.58s

In the above benchmark, we are selecting from lineitem_big, which has
about 100 times as many rows as the normal lineitem.

Query:
select
  sum(l_quantity * l_tax * cast(1 as decimal(38, 37))) +
  sum(l_extendedprice * l_discount * cast(1 as decimal(38, 37)))
from lineitem

Before:
DECIMAL_V2 disabled: 0.23s
DECIMAL_V2 enabled : 0.45s

After:
DECIMAL_V2 disabled: 2.65s
DECIMAL_V2 enabled : 2.54s

The query is slower after the patch because the intermediate result is
int256.

Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1
---
M be/src/exprs/expr-test.cc
M be/src/runtime/decimal-value.inline.h
M be/src/util/bit-util.h
M fe/src/main/java/org/apache/impala/analysis/TypesUtil.java
4 files changed, 253 insertions(+), 58 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/7438/4
-- 
To view, visit http://gerrit.cloudera.org:8080/7438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-5722: Fix string to decimal cast

2017-07-28 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-5722: Fix string to decimal cast
..


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/7517/1/be/src/runtime/decimal-test.cc
File be/src/runtime/decimal-test.cc:

Line 305:   // When the absolute value of the exponent becomes too large, it is 
considered
> This seems a bit weird - it seems like it should be underflow. It seems lik
Changed the be code so that a large negative number would lead to an underflow 
(instead of overflow).


http://gerrit.cloudera.org:8080/#/c/7517/1/be/src/util/decimal-util.h
File be/src/util/decimal-util.h:

Line 63:   DCHECK_GE(result * 10, result);
> The generic integer overflow support wasn't added until gcc5 I believe. Agr
Added a TODO.


http://gerrit.cloudera.org:8080/#/c/7517/1/be/src/util/string-parser.h
File be/src/util/string-parser.h:

PS1, Line 225: DCHECK
> DCHECK_EQ
DCHECK_EQ doesn't work with int128. Added a comment.


-- 
To view, visit http://gerrit.cloudera.org:8080/7517
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1f5eb5b64a6924af2e8eb7f9a50da67015757efe
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5722: Fix string to decimal cast

2017-07-28 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#2).

Change subject: IMPALA-5722: Fix string to decimal cast
..

IMPALA-5722: Fix string to decimal cast

When converting a string to a decimal, we didn't handle the case where
the exponent is a large negative number. The function for computing the
divisor returns -1 when the exponent is too large. The problem is fixed
by making sure that the divisor is positive.

Testing:
-Added decimal tests.

Change-Id: I1f5eb5b64a6924af2e8eb7f9a50da67015757efe
---
M be/src/runtime/decimal-test.cc
M be/src/util/decimal-util.h
M be/src/util/string-parser.h
3 files changed, 31 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/7517/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7517
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1f5eb5b64a6924af2e8eb7f9a50da67015757efe
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 


[Impala-ASF-CR] IMPALA-5636: changed the format metadata of repetition level

2017-07-28 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-5636: changed the format metadata of repetition level
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7514/2//COMMIT_MSG
Commit Message:

Line 7: IMPALA-5636: changed the format metadata of repetition level from 
bit_packing to RLE
> Try to keep the subject line short (if possible, something like 50 characte
It looks like patch #3 is a draft (it should be published). Patch #3 is only 
visible to those who commented on the patch.


-- 
To view, visit http://gerrit.cloudera.org:8080/7514
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4112ec88e8f4050d28661d27f9227450288a6756
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5636: changed the format metadata of repetition level

2017-07-28 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-5636: changed the format metadata of repetition level
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7514/2//COMMIT_MSG
Commit Message:

Line 7: IMPALA-5636: changed the format metadata of repetition level from 
bit_packing to RLE
Try to keep the subject line short (if possible, something like 50 characters)

Also, use the imperative mood in the subject line. i.e. replace the word 
"changed" with "Change". Also, capitalize the first letter.

Give a brief description of the patch and why it's needed in the body of the 
commit message.


-- 
To view, visit http://gerrit.cloudera.org:8080/7514
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4112ec88e8f4050d28661d27f9227450288a6756
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5689: Avoid inverting non-equi left joins

2017-07-27 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-5689: Avoid inverting non-equi left joins
..


Patch Set 4: Code-Review+2

Rebased and forwarding the +2

-- 
To view, visit http://gerrit.cloudera.org:8080/7476
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I91ba66fe30139fcd44d4615a142f183266800aab
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5689: Avoid inverting non-equi left joins

2017-07-27 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#4).

Change subject: IMPALA-5689: Avoid inverting non-equi left joins
..

IMPALA-5689: Avoid inverting non-equi left joins

When checking if a join can be inverted, we forgot to also check that
the resulting join would not be a non-equi right semi-join or a
non-equi right outer-join. We currently do not support those kinds of
joins in the backend.

Testing:
-Added a planner test

Change-Id: I91ba66fe30139fcd44d4615a142f183266800aab
---
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/nested-loop-join.test
3 files changed, 80 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/76/7476/4
-- 
To view, visit http://gerrit.cloudera.org:8080/7476
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I91ba66fe30139fcd44d4615a142f183266800aab
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5689: Avoid inverting non-equi left joins

2017-07-27 Thread Taras Bobrovytsky (Code Review)
Hello Bharath Vissapragada, Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7476

to look at the new patch set (#4).

Change subject: IMPALA-5689: Avoid inverting non-equi left joins
..

IMPALA-5689: Avoid inverting non-equi left joins

When checking if a join can be inverted, we forgot to also check that
the resulting join would not be a non-equi right semi-join or a
non-equi right outer-join. We currently do not support those kinds of
joins in the backend.

Testing:
-Added a planner test

Change-Id: I91ba66fe30139fcd44d4615a142f183266800aab
---
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/nested-loop-join.test
3 files changed, 80 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/76/7476/4
-- 
To view, visit http://gerrit.cloudera.org:8080/7476
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I91ba66fe30139fcd44d4615a142f183266800aab
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5689: Avoid inverting non-equi right joins

2017-07-27 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-5689: Avoid inverting non-equi right joins
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7476/3//COMMIT_MSG
Commit Message:

PS3, Line 7: right joins
> left joins?
Done


PS3, Line 10: resulting join would not be a non-equi left semi or non-equi left
: outer join
> since you are talking about the resulting join, you should mean " the resul
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7476
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I91ba66fe30139fcd44d4615a142f183266800aab
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5722: Fix string to decimal cast

2017-07-27 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-5722: Fix string to decimal cast
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7517/1/be/src/util/string-parser.h
File be/src/util/string-parser.h:

Line 221: T divisor = DecimalUtil::GetScaleMultiplier(shift);
> I think it can be a huge negative number once it overflows.
Look at decimal-util.h line 174, where -1 is returned. The function you were 
looking at, Tim, is called only in the case of int256.


-- 
To view, visit http://gerrit.cloudera.org:8080/7517
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1f5eb5b64a6924af2e8eb7f9a50da67015757efe
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5722: Fix string to decimal cast

2017-07-26 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7517

Change subject: IMPALA-5722: Fix string to decimal cast
..

IMPALA-5722: Fix string to decimal cast

When converting a string to a decimal, we didn't handle the case where
the exponent is a large negative number. The function for computing the
divisor returns -1 when the exponent is too large. The problem is fixed
by making sure that the divisor is positive.

Testing:
-Added decimal tests.

Change-Id: I1f5eb5b64a6924af2e8eb7f9a50da67015757efe
---
M be/src/runtime/decimal-test.cc
M be/src/util/decimal-util.h
M be/src/util/string-parser.h
3 files changed, 23 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/7517/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7517
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I1f5eb5b64a6924af2e8eb7f9a50da67015757efe
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-4939, IMPALA-4939: Decimal V2 multiplication

2017-07-24 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#3).

Change subject: IMPALA-4939, IMPALA-4939: Decimal V2 multiplication
..

IMPALA-4939, IMPALA-4939: Decimal V2 multiplication

Implement the new DECIMAL return type rules for multiply expressions,
active when query option DECIMAL_V2=1. The algorithm for determining
the type of the result of multiplication is described in the JIRA.

DECIMAL V1:

+---+
| typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) |
+---+
| DECIMAL(38,38)|
+---+

+---+
| typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) |
+---+
| DECIMAL(38,30)|
+---+

DECIMAL V2:

+---+
| typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) |
+---+
| DECIMAL(38,37)|
+---+

+---+
| typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) |
+---+
| DECIMAL(38,6) |
+---+

In this patch, we also fix the early multiplication overflow. We compute
an int256 intermediate value, which we then attempt to scale down and
round to int128.

Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1
---
M be/src/exprs/expr-test.cc
M be/src/runtime/decimal-value.inline.h
M be/src/util/bit-util.h
M fe/src/main/java/org/apache/impala/analysis/TypesUtil.java
4 files changed, 237 insertions(+), 57 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/7438/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-4939, IMPALA-4939: Decimal V2 multiplication

2017-07-24 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#3).

Change subject: IMPALA-4939, IMPALA-4939: Decimal V2 multiplication
..

IMPALA-4939, IMPALA-4939: Decimal V2 multiplication

Implement the new DECIMAL return type rules for multiply expressions,
active when query option DECIMAL_V2=1. The algorithm for determining
the type of the result of multiplication is described in the JIRA.

DECIMAL V1:

+---+
| typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) |
+---+
| DECIMAL(38,38)|
+---+

+---+
| typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) |
+---+
| DECIMAL(38,30)|
+---+

DECIMAL V2:

+---+
| typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) |
+---+
| DECIMAL(38,37)|
+---+

+---+
| typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) |
+---+
| DECIMAL(38,6) |
+---+

In this patch, we also fix the early multiplication overflow. We compute
an int256 intermediate value, which we then attempt to scale down and
round to int128.

Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1
---
M be/src/exprs/expr-test.cc
M be/src/runtime/decimal-value.inline.h
M be/src/util/bit-util.h
M fe/src/main/java/org/apache/impala/analysis/TypesUtil.java
4 files changed, 237 insertions(+), 57 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/7438/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-4939, IMPALA-4939: Decimal V2 multiplication

2017-07-24 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#3).

Change subject: IMPALA-4939, IMPALA-4939: Decimal V2 multiplication
..

IMPALA-4939, IMPALA-4939: Decimal V2 multiplication

Implement the new DECIMAL return type rules for multiply expressions,
active when query option DECIMAL_V2=1. The algorithm for determining
the type of the result of multiplication is described in the JIRA.

DECIMAL V1:

+---+
| typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) |
+---+
| DECIMAL(38,38)|
+---+

+---+
| typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) |
+---+
| DECIMAL(38,30)|
+---+

DECIMAL V2:

+---+
| typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) |
+---+
| DECIMAL(38,37)|
+---+

+---+
| typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) |
+---+
| DECIMAL(38,6) |
+---+

In this patch, we also fix the early multiplication overflow. We compute
an int256 intermediate value, which we then attempt to scale down and
round to int128.

Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1
---
M be/src/exprs/expr-test.cc
M be/src/runtime/decimal-value.inline.h
M be/src/util/bit-util.h
M fe/src/main/java/org/apache/impala/analysis/TypesUtil.java
4 files changed, 237 insertions(+), 57 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/7438/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-5679: Fix Parquet count(*) with group by string

2017-07-21 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7481

Change subject: IMPALA-5679: Fix Parquet count(*) with group by string
..

IMPALA-5679: Fix Parquet count(*) with group by string

In a recent patch (IMPALA-5036) a bug was introduced where a count(*)
query with a group by a string partition column returned incorrect
results. Data was being written into the tuple at an incorrect offset.

Testing:
- Added an end to end test where we are selecting from a table
  partitioned by string.

Change-Id: I225547574c2b2259ca81cb642d082e151f3bed6b
---
M be/src/exec/hdfs-scan-node-base.h
M testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test
M tests/query_test/test_aggregation.py
3 files changed, 26 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/81/7481/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7481
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I225547574c2b2259ca81cb642d082e151f3bed6b
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 


  1   2   3   4   5   6   7   >