[Impala-ASF-CR] IMPALA-3436: Return a decimal when rounding a double
Taras Bobrovytsky has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/8398 ) Change subject: IMPALA-3436: Return a decimal when rounding a double .. IMPALA-3436: Return a decimal when rounding a double Before this patch, the function round(double, int) returned a double, which is inconsistent with other round() functions. In this patch, we change the return type of round(double, int) to be decimal only if decimal_v2 is enabled. If decimal_v2 is not enabled, the function returns a double, as before. This is implemented by introducing a translateion where we make the parser aware of query options and rename the function to round_v1() during parsing if decimal_v2 is not enabled. To make this translation invisible, we also rename it back to round() in toSql(). This translation should be removed when we remove decimal v1. Testing: - Added EE tests. Change-Id: I240ec70da7996bbfefcf6438eba4dff8d33d7bd6 --- M be/src/exprs/decimal-functions-ir.cc M be/src/exprs/decimal-functions.h M be/src/exprs/decimal-operators.h M common/function-registry/impala_functions.py M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java M fe/src/main/java/org/apache/impala/analysis/FunctionName.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java M testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test 10 files changed, 268 insertions(+), 18 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/8398/4 -- To view, visit http://gerrit.cloudera.org:8080/8398 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I240ec70da7996bbfefcf6438eba4dff8d33d7bd6 Gerrit-Change-Number: 8398 Gerrit-PatchSet: 4 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-3436: Return a decimal when rounding a double
Taras Bobrovytsky has posted comments on this change. ( http://gerrit.cloudera.org:8080/8398 ) Change subject: IMPALA-3436: Return a decimal when rounding a double .. Patch Set 3: (5 comments) http://gerrit.cloudera.org:8080/#/c/8398/3/be/src/exprs/decimal-functions-ir.cc File be/src/exprs/decimal-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/8398/3/be/src/exprs/decimal-functions-ir.cc@119 PS3, Line 119: Decimal16Value > Do we always need Decimal16? Or should we switch on ByteSize() like the oth Yes, the result is a decimal with precision 38, so it always requires 16 bytes. http://gerrit.cloudera.org:8080/#/c/8398/3/common/function-registry/impala_functions.py File common/function-registry/impala_functions.py: http://gerrit.cloudera.org:8080/#/c/8398/3/common/function-registry/impala_functions.py@285 PS3, Line 285: [['round_v1','dround_v1'], > I wonder if we should keep the functionality around under another alias in Ok, are you suggesting to leave the code as is right now and add "fround" when we get rid of decimal_v1? Also, added comment. http://gerrit.cloudera.org:8080/#/c/8398/3/common/function-registry/impala_functions.py@289 PS3, Line 289: [['round','dround','round_v1','dround_v1'], > This function is used both decimal_v1/v2 modes. Done http://gerrit.cloudera.org:8080/#/c/8398/1/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java: http://gerrit.cloudera.org:8080/#/c/8398/1/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java@407 PS1, Line 407: Type == nu > situation This comment was modified in patch 2. http://gerrit.cloudera.org:8080/#/c/8398/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java: http://gerrit.cloudera.org:8080/#/c/8398/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java@2439 PS3, Line 2439: AnalyzesOk("select round(cast(1.123 as double), int_col) from functional.alltypes", > add a positive test with a non-NULL constant second argument to both v1/v2 Done -- To view, visit http://gerrit.cloudera.org:8080/8398 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I240ec70da7996bbfefcf6438eba4dff8d33d7bd6 Gerrit-Change-Number: 8398 Gerrit-PatchSet: 3 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 15 Nov 2017 00:35:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6183: Fix Decimal to Double conversion
Taras Bobrovytsky has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8547 Change subject: IMPALA-6183: Fix Decimal to Double conversion .. IMPALA-6183: Fix Decimal to Double conversion When converting a decimal to a double, we incorrectly used the powf() function in the backend, which returns a float instead of a double. This caused us to lose precision. We fix the problem by replacing the powf() function with a pow() function, which returns a double. Testing: - Added an EE test. Change-Id: I9bf81d039e5037f22c64a32b328832235aafe9e3 --- M be/src/runtime/decimal-value.inline.h M testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test 2 files changed, 27 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/8547/1 -- To view, visit http://gerrit.cloudera.org:8080/8547 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I9bf81d039e5037f22c64a32b328832235aafe9e3 Gerrit-Change-Number: 8547 Gerrit-PatchSet: 1 Gerrit-Owner: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-5017: Error on decimal overflow
Taras Bobrovytsky has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/8404 ) Change subject: IMPALA-5017: Error on decimal overflow .. IMPALA-5017: Error on decimal overflow Before this patch, decimal operations would either silently overflow (in the case of sum() and avg()), or produce a warning. In this patch, the behaviour is changed so that an error is produced in the case of overflow when DECIMAL_v2 is enabled. Decimal v1 behaviour is unchanged. We introduce overflow checks when computing sum() and avg(). This results in a ~30% performance regression when we are in decimal v2 mode compared to decimal v1. Benchmarks: Query: select sum(dec_38_19) from decimal_tbl Decimal v1: 13.09s Decimal v2: 17.10s Query: select avg(dec_38_19) from decimal_tbl Decimal v1: 13.60s Decimal v2: 19.10s Query: select avg(dec_9_5) from decimal_tbl Decimal v1: 12.06s Decimal v2: 18.07s Testing: - Added several end to end tests. - Updated Expr tests to check for error in case of overflow. Change-Id: Id98a92c9a9469ec8cf14e518c741a2dab7053019 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/decimal-operators-ir.cc M be/src/exprs/expr-codegen-test.cc M be/src/exprs/expr-test.cc M testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test M testdata/workloads/functional-query/queries/QueryTest/decimal.test 6 files changed, 214 insertions(+), 69 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/8404/2 -- To view, visit http://gerrit.cloudera.org:8080/8404 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id98a92c9a9469ec8cf14e518c741a2dab7053019 Gerrit-Change-Number: 8404 Gerrit-PatchSet: 2 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: John Sherman Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-5017: Error on decimal overflow
Taras Bobrovytsky has posted comments on this change. ( http://gerrit.cloudera.org:8080/8404 ) Change subject: IMPALA-5017: Error on decimal overflow .. Patch Set 1: (4 comments) http://gerrit.cloudera.org:8080/#/c/8404/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8404/1//COMMIT_MSG@12 PS1, Line 12: In this patch, the beharior is changed so that an error is produced in > I'll also accept behaviour ;) Done http://gerrit.cloudera.org:8080/#/c/8404/1//COMMIT_MSG@27 PS1, Line 27: select avg(dec_38_19) from decimal_tbl > I guess these regressions occur regardless of the width of the input decima Correct. Added another benchmark to illustrate this. http://gerrit.cloudera.org:8080/#/c/8404/1/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/8404/1/be/src/exprs/aggregate-functions-ir.cc@413 PS1, Line 413: abs(avg->sum_val16) > DecimalUtil::MAX_UNSCALED_DECIMAL16 - abs(src.val4))) { > This isn't correct if sum_val16 and src.val* have opposite sign, is it? That's true. Fixed and added a test case. http://gerrit.cloudera.org:8080/#/c/8404/1/be/src/exprs/aggregate-functions-ir.cc@420 PS1, Line 420: abs(avg->sum_val16) > DecimalUtil::MAX_UNSCALED_DECIMAL16 - abs(src.val8))) { > What do you think about only checking for overflow of the integer type at t 1. Are you suggesting to check for something like abs(avg->sum_val16) > 2**128 - abs(src.val8))) ? I don't really see why it is faster to check against 2**128 instead of against MAX_UNSCALED_DECIMAL. 2. What do you mean by initial check? Where would this check be done? Something like (decimal_v2 && sum_val16 > 2**32 && abs(avg->sum_val16) > MAX_UNSCALED_DECIMAL - abs(src.val8)))? -- To view, visit http://gerrit.cloudera.org:8080/8404 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id98a92c9a9469ec8cf14e518c741a2dab7053019 Gerrit-Change-Number: 8404 Gerrit-PatchSet: 1 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: John Sherman Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Tue, 07 Nov 2017 21:59:10 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5019: Decimal V2 addition
Taras Bobrovytsky has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/8309 ) Change subject: IMPALA-5019: Decimal V2 addition .. IMPALA-5019: Decimal V2 addition In this patch, we implement the new decimal return type rules for addition expressions. These rules become active when the query option DECIMAL_V2 is enabled. The algorithm for determining the type of the result is described in the JIRA. DECIMAL V1: ++ | typeof(cast(1 as decimal(38,0)) + cast(0.1 as decimal(38,38))) | ++ | DECIMAL(38,38) | ++ DECIMAL V2: ++ | typeof(cast(1 as decimal(38,0)) + cast(0.1 as decimal(38,38))) | ++ | DECIMAL(38,6) | ++ This patch required backend changes. We implement an algorithm where we handle the whole and fractional parts separately, and then combine them to get the final result. This is more complex and slower. We try to avoid this by first checking if the result would fit into int128. Testing: - Added expr tests. - Tested locally on my machine with a script that generates random decimal numbers and checks that Impala adds them correctly. Performance: For the common case, performance remains the same. select cast(2.2 as decimal(18, 1) + cast(2.2 as decimal(18, 1) BEFORE: 4.74s AFTER: 4.73s In this case, we check if it is necessary to do the complex addition, and it turns out to be not necessary. We see a slowdown because the result needs to be scaled down by dividing. select cast(2.2 as decimal(38, 19) + cast(2.2 as decimal(38, 19) BEFORE: 1.63s AFTER: 13.57s In following case, we take the most complex path and see the most signification perfmance hit. select cast(7.5 as decimal(38,37)) + cast(2.2 as decimal(38,37)) BEFORE: 1.63s AFTER: 20.57 Change-Id: I401049c56d910eb1546a178c909c923b01239336 --- M be/src/exprs/expr-test.cc M be/src/runtime/decimal-value.h M be/src/runtime/decimal-value.inline.h M fe/src/main/java/org/apache/impala/analysis/TypesUtil.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java 5 files changed, 392 insertions(+), 87 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/8309/4 -- To view, visit http://gerrit.cloudera.org:8080/8309 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I401049c56d910eb1546a178c909c923b01239336 Gerrit-Change-Number: 8309 Gerrit-PatchSet: 4 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-5019: Decimal V2 addition
Taras Bobrovytsky has posted comments on this change. ( http://gerrit.cloudera.org:8080/8309 ) Change subject: IMPALA-5019: Decimal V2 addition .. Patch Set 3: (5 comments) http://gerrit.cloudera.org:8080/#/c/8309/3/be/src/runtime/decimal-value.inline.h File be/src/runtime/decimal-value.inline.h: http://gerrit.cloudera.org:8080/#/c/8309/3/be/src/runtime/decimal-value.inline.h@179 PS3, Line 179: int128_t& x_left, int128_t& x_right, int128_t& y_left, int128_t& y_right) { > Use pointers for output arguments: https://google.github.io/styleguide/cppg Done http://gerrit.cloudera.org:8080/#/c/8309/3/be/src/runtime/decimal-value.inline.h@214 PS3, Line 214: right_overflow > carry_to_left might be a better name? Since the intent is to carry this int Agreed, changed. http://gerrit.cloudera.org:8080/#/c/8309/3/be/src/runtime/decimal-value.inline.h@243 PS3, Line 243: inline int128_t SubtractLarge(int128_t x, int x_scale, int128_t y, int y_scale, > What do you think about make it so that this always computes x - y, and mak The way it is currently implemented, I don't see an advantage in doing that. We do not use the subtraction operation in this function, we still add x and y together. Are you suggesting to rewrite it, so that x and y are both expected to be positive and we will be doing x - y here? It's not quite clear to me which case would be eliminated if we rewrite it that way. (Both cases on line 264 would still need to be present) http://gerrit.cloudera.org:8080/#/c/8309/2/be/src/runtime/decimal-value.inline.h File be/src/runtime/decimal-value.inline.h: http://gerrit.cloudera.org:8080/#/c/8309/2/be/src/runtime/decimal-value.inline.h@305 PS2, Line 305: RESULT_T y = 0; > Ah ok. I wonder if renaming the variable would make it clearer. Something l Ranamed to result_scale_decrease http://gerrit.cloudera.org:8080/#/c/8309/2/be/src/runtime/decimal-value.inline.h@310 PS2, Line 310: } > Thanks. I still think something is missing in the AdjustToSameScale() name Modified the comment of AdjustToSameScale() slightly. I was debating changing the name to AdjustToMaxScale(), but that might be confusing (because max scale is 38)). -- To view, visit http://gerrit.cloudera.org:8080/8309 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I401049c56d910eb1546a178c909c923b01239336 Gerrit-Change-Number: 8309 Gerrit-PatchSet: 3 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Tue, 07 Nov 2017 01:43:37 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3436: Return a decimal when rounding a double
Taras Bobrovytsky has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/8398 ) Change subject: IMPALA-3436: Return a decimal when rounding a double .. IMPALA-3436: Return a decimal when rounding a double Before this patch, the function round(double, int) returned a double, which is inconsistent with other round() functions. In this patch, we change the return type of round(double, int) to be decimal only if decimal_v2 is enabled. If decimal_v2 is not enabled, the function returns a double, as before. This is implemented by introducing a translateion where we make the parser aware of query options and rename the function to round_v1() during parsing if decimal_v2 is not enabled. To make this translation invisible, we also rename it back to round() in toSql(). This translation should be removed when we remove decimal v1. Testing: - Added EE tests. Change-Id: I240ec70da7996bbfefcf6438eba4dff8d33d7bd6 --- M be/src/exprs/decimal-functions-ir.cc M be/src/exprs/decimal-functions.h M be/src/exprs/decimal-operators.h M common/function-registry/impala_functions.py M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java M fe/src/main/java/org/apache/impala/analysis/FunctionName.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java M testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test 10 files changed, 264 insertions(+), 18 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/8398/3 -- To view, visit http://gerrit.cloudera.org:8080/8398 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I240ec70da7996bbfefcf6438eba4dff8d33d7bd6 Gerrit-Change-Number: 8398 Gerrit-PatchSet: 3 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-3436: Return a decimal when rounding a double
Taras Bobrovytsky has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/8398 ) Change subject: IMPALA-3436: Return a decimal when rounding a double .. IMPALA-3436: Return a decimal when rounding a double Before this patch, the function round(double, int) returned a double, which is inconsistent with other round() functions. In this patch, we change the return type of round(double, int) to be decimal only if decimal_v2 is enabled. If decimal_v2 is not enabled, the function returns a double, as before. This is implemented by introducing a translateion where we make the parser aware of query options and rename the function to round_v1() during parsing if decimal_v2 is not enabled. To make this translation invisible, we also rename it back to round() in toSql(). This translation should be removed when we remove decimal v1. Testing: - Added EE tests. Change-Id: I240ec70da7996bbfefcf6438eba4dff8d33d7bd6 --- M be/src/exprs/decimal-functions-ir.cc M be/src/exprs/decimal-functions.h M be/src/exprs/decimal-operators.h M common/function-registry/impala_functions.py M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java M fe/src/main/java/org/apache/impala/analysis/FunctionName.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java M testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test 10 files changed, 264 insertions(+), 18 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/8398/2 -- To view, visit http://gerrit.cloudera.org:8080/8398 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I240ec70da7996bbfefcf6438eba4dff8d33d7bd6 Gerrit-Change-Number: 8398 Gerrit-PatchSet: 2 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-3436: Return a decimal when rounding a double
Taras Bobrovytsky has posted comments on this change. ( http://gerrit.cloudera.org:8080/8398 ) Change subject: IMPALA-3436: Return a decimal when rounding a double .. Patch Set 1: (13 comments) http://gerrit.cloudera.org:8080/#/c/8398/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8398/1//COMMIT_MSG@16 PS1, Line 16: This is implemented by introducing a "hack" where we make the parser > How about we say rewrite or translation instead of hack? I don't think it's Done http://gerrit.cloudera.org:8080/#/c/8398/1/be/src/exprs/decimal-functions-ir.cc File be/src/exprs/decimal-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/8398/1/be/src/exprs/decimal-functions-ir.cc@111 PS1, Line 111: DCHECK(!scale.is_null); > Add a FE test case in AnalyzeExprsTest.java that makes sure this is enforce Added a test. I think we should remain consistent with other decimal round functions. They do not allow a non-constant round arguments, so I think it makes sense to not allow it here as well. http://gerrit.cloudera.org:8080/#/c/8398/1/be/src/exprs/decimal-functions-ir.cc@116 PS1, Line 116: bool ovf = false; > overflow Done http://gerrit.cloudera.org:8080/#/c/8398/1/be/src/exprs/decimal-functions.h File be/src/exprs/decimal-functions.h: http://gerrit.cloudera.org:8080/#/c/8398/1/be/src/exprs/decimal-functions.h@58 PS1, Line 58: FunctionContext*, const DoubleVal&, const IntVal&); > add arg names for consistency Done http://gerrit.cloudera.org:8080/#/c/8398/1/common/function-registry/impala_functions.py File common/function-registry/impala_functions.py: http://gerrit.cloudera.org:8080/#/c/8398/1/common/function-registry/impala_functions.py@284 PS1, Line 284: [['round_v1','dround_v1'], > Why keep the dround_v1 alias? Because what if someone typed in "select dround(x)". We need to be able to differentiate between round and dround in error messages. Are you suggesting to get rid of the "dround" alias completely? http://gerrit.cloudera.org:8080/#/c/8398/1/common/function-registry/impala_functions.py@286 PS1, Line 286: [['round','dround'], 'DECIMAL', ['DOUBLE', 'INT'], 'impala::DecimalFunctions::RoundTo'], > Can you organize these and comment on whether these are used in v1/v2 or bo Done http://gerrit.cloudera.org:8080/#/c/8398/1/common/function-registry/impala_functions.py@287 PS1, Line 287: [['round','dround','round_v1','dround_v1'], > Why keep the dround_v1 alias? Same reason as above. http://gerrit.cloudera.org:8080/#/c/8398/1/common/function-registry/impala_functions.py@403 PS1, Line 403: [['round','dround','round_v1','dround_v1'], > Why keep the dround_v1 alias in these? Answered above. http://gerrit.cloudera.org:8080/#/c/8398/1/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: http://gerrit.cloudera.org:8080/#/c/8398/1/fe/src/main/cup/sql-parser.cup@70 PS1, Line 70: private TQueryOptions queryOptions; > add TODO to remove when decimal v1 is gone, maybe come up with a specific t Done http://gerrit.cloudera.org:8080/#/c/8398/1/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java: http://gerrit.cloudera.org:8080/#/c/8398/1/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java@100 PS1, Line 100: FunctionCallExpr functionCallExpr = new FunctionCallExpr(fnName, params); > newline Done http://gerrit.cloudera.org:8080/#/c/8398/1/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java@111 PS1, Line 111: // nullif(x, y) -> if(x DISTINCT FROM y, x, NULL) > newline Done http://gerrit.cloudera.org:8080/#/c/8398/1/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java@385 PS1, Line 385:* This can only be called for functions that return wildcard decimals and the first > Is this comment still accurate? Fixed the comment. http://gerrit.cloudera.org:8080/#/c/8398/1/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java@407 PS1, Line 407: // The situtation where none of the parameters are decimal, but the return type is > * Shrink comment to: Done -- To view, visit http://gerrit.cloudera.org:8080/8398 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I240ec70da7996bbfefcf6438eba4dff8d33d7bd6 Gerrit-Change-Number: 8398 Gerrit-PatchSet: 1 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Comment-Date: Tue, 07 Nov 2017 00:35:00 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5019: Decimal V2 addition
Taras Bobrovytsky has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/8309 ) Change subject: IMPALA-5019: Decimal V2 addition .. IMPALA-5019: Decimal V2 addition In this patch, we implement the new decimal return type rules for addition expressions. These rules become active when the query option DECIMAL_V2 is enabled. The algorithm for determining the type of the result is described in the JIRA. DECIMAL V1: ++ | typeof(cast(1 as decimal(38,0)) + cast(0.1 as decimal(38,38))) | ++ | DECIMAL(38,38) | ++ DECIMAL V2: ++ | typeof(cast(1 as decimal(38,0)) + cast(0.1 as decimal(38,38))) | ++ | DECIMAL(38,6) | ++ This patch required backend changes. We implement an algorithm where we handle the whole and fractional parts separately, and then combine them to get the final result. This is more complex and slower. We try to avoid this by first checking if the result would fit into int128. Testing: - Added expr tests. - Tested locally on my machine with a script that generates random decimal numbers and checks that Impala adds them correctly. Performance: For the common case, performance remains the same. select cast(2.2 as decimal(18, 1) + cast(2.2 as decimal(18, 1) BEFORE: 4.74s AFTER: 4.73s In this case, we check if it is necessary to do the complex addition, and it turns out to be not necessary. We see a slowdown because the result needs to be scaled down by dividing. select cast(2.2 as decimal(38, 19) + cast(2.2 as decimal(38, 19) BEFORE: 1.63s AFTER: 13.57s In following case, we take the most complex path and see the most signification perfmance hit. select cast(7.5 as decimal(38,37)) + cast(2.2 as decimal(38,37)) BEFORE: 1.63s AFTER: 20.57 Change-Id: I401049c56d910eb1546a178c909c923b01239336 --- M be/src/exprs/expr-test.cc M be/src/runtime/decimal-value.inline.h M fe/src/main/java/org/apache/impala/analysis/TypesUtil.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java 4 files changed, 387 insertions(+), 84 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/8309/3 -- To view, visit http://gerrit.cloudera.org:8080/8309 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I401049c56d910eb1546a178c909c923b01239336 Gerrit-Change-Number: 8309 Gerrit-PatchSet: 3 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-5019: Decimal V2 addition
Taras Bobrovytsky has posted comments on this change. ( http://gerrit.cloudera.org:8080/8309 ) Change subject: IMPALA-5019: Decimal V2 addition .. Patch Set 2: (12 comments) http://gerrit.cloudera.org:8080/#/c/8309/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8309/1//COMMIT_MSG@31 PS1, Line 31: to avoid this by first checking if the result would into int128. > fit? Done http://gerrit.cloudera.org:8080/#/c/8309/2/be/src/runtime/decimal-value.inline.h File be/src/runtime/decimal-value.inline.h: http://gerrit.cloudera.org:8080/#/c/8309/2/be/src/runtime/decimal-value.inline.h@166 PS2, Line 166: LeadingZeroAdjustment > I think the name could make it clearer what it's computing. Maybe something Done. I like the new name. http://gerrit.cloudera.org:8080/#/c/8309/2/be/src/runtime/decimal-value.inline.h@170 PS2, Line 170: floor(log2(b)) > The table values are actually floor(log2(10^b)) for i = 0, 1, 2, 3... etc, Yes, exactly. Fixed the comment. http://gerrit.cloudera.org:8080/#/c/8309/2/be/src/runtime/decimal-value.inline.h@181 PS2, Line 181: same_sign > We usually make numeric/bool template arguments upper case to make it clear Separated this large function into 3 smaller ones, as you suggested. http://gerrit.cloudera.org:8080/#/c/8309/2/be/src/runtime/decimal-value.inline.h@185 PS2, Line 185: // This is expected to be handled by the caller. > What about the case when they're zero? That's also meant to be handled by t It wasn't possible for this condition to fail because at least one must be greater than zero for the following reasons: - If they are both zero, we will not be calling this function, because of the leading zero test. - If one is negative and the other is zero, we will invert it on line 327 I modified this check in the new implementation. Yes we do have cases, where we add zero. http://gerrit.cloudera.org:8080/#/c/8309/2/be/src/runtime/decimal-value.inline.h@272 PS2, Line 272: DCHECK_EQ(result_scale, std::max(this_scale, other_scale)); > A brief comment that this is guaranteed by the frontend migh help people ne Done http://gerrit.cloudera.org:8080/#/c/8309/2/be/src/runtime/decimal-value.inline.h@277 PS2, Line 277: DCHECK(!*overflow) << "Cannot overflow unless result is Decimal16Value"; > extra indent here Done http://gerrit.cloudera.org:8080/#/c/8309/2/be/src/runtime/decimal-value.inline.h@287 PS2, Line 287: if (this_scale < other_scale) { > It might be slightly easier to follow if the branches were(delta_scale < 0) Rewrote the code to make it more clear. http://gerrit.cloudera.org:8080/#/c/8309/2/be/src/runtime/decimal-value.inline.h@299 PS2, Line 299: into > long line. Done http://gerrit.cloudera.org:8080/#/c/8309/2/be/src/runtime/decimal-value.inline.h@305 PS2, Line 305: bool ovf = AdjustToSameScale(*this, this_scale, other, other_scale, > It feels a bit weird that we're calculating delta_scale above and inside Ad Delta scale is different in AdjustToSameScale(). There are two delta scales in this patch: between x and y and between max(x_scale, y_scale) and result_scale. What we are doing here and in AddLarge() is first adjust to the same scale, do the addition, then scale down to result_scale. http://gerrit.cloudera.org:8080/#/c/8309/2/be/src/runtime/decimal-value.inline.h@310 PS2, Line 310: if (delta_scale > 0) { > This might need a brief comment to explain why it's necessary (or perhaps t Added comment. http://gerrit.cloudera.org:8080/#/c/8309/2/fe/src/main/java/org/apache/impala/analysis/TypesUtil.java File fe/src/main/java/org/apache/impala/analysis/TypesUtil.java: http://gerrit.cloudera.org:8080/#/c/8309/2/fe/src/main/java/org/apache/impala/analysis/TypesUtil.java@228 PS2, Line 228:* TODO: implement V2 rules for ADD/SUB. > TODO not needed. Done -- To view, visit http://gerrit.cloudera.org:8080/8309 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I401049c56d910eb1546a178c909c923b01239336 Gerrit-Change-Number: 8309 Gerrit-PatchSet: 2 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Sat, 28 Oct 2017 00:02:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5017: Error on decimal overflow
Taras Bobrovytsky has posted comments on this change. ( http://gerrit.cloudera.org:8080/8404 ) Change subject: IMPALA-5017: Error on decimal overflow .. Patch Set 1: Ooops, canceled -- To view, visit http://gerrit.cloudera.org:8080/8404 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id98a92c9a9469ec8cf14e518c741a2dab7053019 Gerrit-Change-Number: 8404 Gerrit-PatchSet: 1 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Fri, 27 Oct 2017 03:32:29 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5017: Error on decimal overflow
Taras Bobrovytsky has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8404 Change subject: IMPALA-5017: Error on decimal overflow .. IMPALA-5017: Error on decimal overflow Before this patch, decimal operations would either silently overflow (in the case of sum() and avg()), or produce a warning. In this patch, the beharior is changed so that an error is produced in the case of overflow when DECIMAL_v2 is enabled. Decimal v1 behavior is unchanged. We introduce overflow checks when computing sum() and avg(). This results in a ~30% performance regression when we are in decimal v2 mode compared to decimal v1. Benchmarks: Query: select sum(dec_38_19) from decimal_tbl Decimal v1: 13.09s Decimal v2: 17.10s Query: select avg(dec_38_19) from decimal_tbl Decimal v1: 13.60s Decimal v2: 19.10s Testing: - Added several end to end tests. - Updated Expr tests to check for error in case of overflow. Change-Id: Id98a92c9a9469ec8cf14e518c741a2dab7053019 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/decimal-operators-ir.cc M be/src/exprs/expr-codegen-test.cc M be/src/exprs/expr-test.cc M testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test M testdata/workloads/functional-query/queries/QueryTest/decimal.test 6 files changed, 186 insertions(+), 69 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/8404/1 -- To view, visit http://gerrit.cloudera.org:8080/8404 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Id98a92c9a9469ec8cf14e518c741a2dab7053019 Gerrit-Change-Number: 8404 Gerrit-PatchSet: 1 Gerrit-Owner: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-3436: Return a decimal when rounding a double
Taras Bobrovytsky has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8398 Change subject: IMPALA-3436: Return a decimal when rounding a double .. IMPALA-3436: Return a decimal when rounding a double Before this patch, the function round(double, int) returned a double, which is inconsistent with other round() functions. In this patch, we change the return type of round(double, int) to be decimal only if decimal_v2 is enabled. If decimal_v2 is not enabled, the function returns a double, as before. This is implemented by introducing a "hack" where we make the parser aware of query options and rename the function to round_v1() during parsing if decimal_v2 is not enabled. To make this hack invisible, we also rename it back to round() in toSql(). This hack should be removed when we remove decimal v1. Testing: - Added EE tests. Change-Id: I240ec70da7996bbfefcf6438eba4dff8d33d7bd6 --- M be/src/exprs/decimal-functions-ir.cc M be/src/exprs/decimal-functions.h M be/src/exprs/decimal-operators.h M common/function-registry/impala_functions.py M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java M testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test 8 files changed, 227 insertions(+), 13 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/8398/1 -- To view, visit http://gerrit.cloudera.org:8080/8398 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I240ec70da7996bbfefcf6438eba4dff8d33d7bd6 Gerrit-Change-Number: 8398 Gerrit-PatchSet: 1 Gerrit-Owner: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-4964: Fix Decimal modulo overflow
Taras Bobrovytsky has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/8329 ) Change subject: IMPALA-4964: Fix Decimal modulo overflow .. IMPALA-4964: Fix Decimal modulo overflow The modulo operation between two decimals should never overflow. Before this patch, there would be an overflow if the scale difference between the two decimals was large. We would try to scale up the one with the smaller scale, so that the scales matched, which could result in an overflow. We fix the problem by first checking if the scaled up value would fit into 128 bits by estimating the number of leading zeros in it. If we detect that 128 bits is not enough, we convert both numbers to int256, do the operation, then convert back to 128 bits. Testing: - Added some expr tests that excercise the new code path. Change-Id: I5420201d4440d421e33e443df005cdcc16b8a6cd --- M be/src/exprs/expr-test.cc M be/src/runtime/decimal-test.cc M be/src/runtime/decimal-value.inline.h 3 files changed, 125 insertions(+), 27 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/8329/3 -- To view, visit http://gerrit.cloudera.org:8080/8329 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5420201d4440d421e33e443df005cdcc16b8a6cd Gerrit-Change-Number: 8329 Gerrit-PatchSet: 3 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-Reviewer: anujphadke
[Impala-ASF-CR] IMPALA-5018: Error on decimal modulo or divide by zero
Taras Bobrovytsky has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/8344 ) Change subject: IMPALA-5018: Error on decimal modulo or divide by zero .. IMPALA-5018: Error on decimal modulo or divide by zero Before this patch, decimal operations would never produce an error. Division by and modulo zero would result in a NULL. In this patch, we change this behavior so that we raise an error instead of returning a NULL. We also modify the format of the decimal expr tests format to also include an error field. Testing: - Added several expr and end to end tests. Change-Id: If7a7131e657fcdd293ade78d62f851dac0f1e3eb --- M be/src/exprs/decimal-operators-ir.cc M be/src/exprs/expr-test.cc M testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test 3 files changed, 682 insertions(+), 565 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/8344/4 -- To view, visit http://gerrit.cloudera.org:8080/8344 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If7a7131e657fcdd293ade78d62f851dac0f1e3eb Gerrit-Change-Number: 8344 Gerrit-PatchSet: 4 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zach Amsden Gerrit-Reviewer: anujphadke
[Impala-ASF-CR] IMPALA-5018: Error on decimal modulo or divide by zero
Taras Bobrovytsky has posted comments on this change. ( http://gerrit.cloudera.org:8080/8344 ) Change subject: IMPALA-5018: Error on decimal modulo or divide by zero .. Patch Set 3: (4 comments) http://gerrit.cloudera.org:8080/#/c/8344/3/be/src/exprs/decimal-operators-ir.cc File be/src/exprs/decimal-operators-ir.cc: http://gerrit.cloudera.org:8080/#/c/8344/3/be/src/exprs/decimal-operators-ir.cc@729 PS3, Line 729: divide > My thought here when I suggested it was that the module was just an output In one of the previous comments, Vuk mentioned that Postgres returns an identical error message for both mod and division: "ERROR: division by zero" Personally, I don't think it's too confusing if it leave it is as. Or maybe it can be modified to "Cannot modulo decimal or divide decimal by zero" What do you think? http://gerrit.cloudera.org:8080/#/c/8344/3/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: http://gerrit.cloudera.org:8080/#/c/8344/3/be/src/exprs/expr-test.cc@2264 PS3, Line 2264: { true, false, 0, 38, 19 }}}, > it would be good to test also Added tests to pytest. http://gerrit.cloudera.org:8080/#/c/8344/3/be/src/exprs/expr-test.cc@2506 PS3, Line 2506: 0/0 > I think we should even change the math function fmod to return a consistent Created IMPALA-6103. http://gerrit.cloudera.org:8080/#/c/8344/3/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test File testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test: http://gerrit.cloudera.org:8080/#/c/8344/3/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test@46 PS3, Line 46: cast > is that cast necessary? Let's also verify you get the error without the cas Done -- To view, visit http://gerrit.cloudera.org:8080/8344 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If7a7131e657fcdd293ade78d62f851dac0f1e3eb Gerrit-Change-Number: 8344 Gerrit-PatchSet: 3 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zach Amsden Gerrit-Reviewer: anujphadke Gerrit-Comment-Date: Tue, 24 Oct 2017 00:12:18 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4964: Fix Decimal modulo overflow
Taras Bobrovytsky has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/8329 ) Change subject: IMPALA-4964: Fix Decimal modulo overflow .. IMPALA-4964: Fix Decimal modulo overflow The modulo operation between two decimals should never overflow. Before this patch, there would be an overflow if the scale difference between the two decimals was large. We would try to scale up the one with the smaller scale, so that the scales matched, which could result in an overflow. We fix the problem by first checking if the scaled up value would fit into 128 bits by estimating the number of leading zeros in it. If we detect that 128 bits is not enough, we convert both numbers to int256, do the operation, then convert back to 128 bits. Testing: - Added some expr tests that excercise the new code path. Change-Id: I5420201d4440d421e33e443df005cdcc16b8a6cd --- M be/src/exprs/expr-test.cc M be/src/runtime/decimal-test.cc M be/src/runtime/decimal-value.inline.h 3 files changed, 120 insertions(+), 27 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/8329/2 -- To view, visit http://gerrit.cloudera.org:8080/8329 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5420201d4440d421e33e443df005cdcc16b8a6cd Gerrit-Change-Number: 8329 Gerrit-PatchSet: 2 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-Reviewer: anujphadke
[Impala-ASF-CR] IMPALA-4964: Fix Decimal modulo overflow
Taras Bobrovytsky has posted comments on this change. ( http://gerrit.cloudera.org:8080/8329 ) Change subject: IMPALA-4964: Fix Decimal modulo overflow .. Patch Set 1: (7 comments) http://gerrit.cloudera.org:8080/#/c/8329/1/be/src/runtime/decimal-value.inline.h File be/src/runtime/decimal-value.inline.h: http://gerrit.cloudera.org:8080/#/c/8329/1/be/src/runtime/decimal-value.inline.h@a281 PS1, Line 281: : > May help to keep this comment. Done http://gerrit.cloudera.org:8080/#/c/8329/1/be/src/runtime/decimal-value.inline.h@181 PS1, Line 181: CheckAdjustment > MinLeadingZero() or some other more meaningful name ? Done http://gerrit.cloudera.org:8080/#/c/8329/1/be/src/runtime/decimal-value.inline.h@497 PS1, Line 497: (other.value() == 0) > nit: parenthesis not needed. Done http://gerrit.cloudera.org:8080/#/c/8329/1/be/src/runtime/decimal-value.inline.h@499 PS1, Line 499: // Mod by 0. > Shouldn't we raise an error on decimal modulo 0 operations like you did in Erroring on mod 0 is handled in another of my patches. We raise an error in that patch after we set is_nan in this function. http://gerrit.cloudera.org:8080/#/c/8329/1/be/src/runtime/decimal-value.inline.h@498 PS1, Line 498: if (UNLIKELY(*is_nan)) { : // Mod by 0. : return DecimalValue(); : } > nit: one liner. Done http://gerrit.cloudera.org:8080/#/c/8329/1/be/src/runtime/decimal-value.inline.h@505 PS1, Line 505: if (sizeof(RESULT_T) < 16 || : result_precision < 38 || : // If the scales are the same, there is no danger in overflowing due to scaling up. : this_scale == other_scale || : detail::CheckAdjustment(value(), this_scale, other.value(), other_scale) >= 2) { > May help to add a comment to explain what this check is trying to achieve. Added a comment to make it more clear. Added a comment about overflow at the bottom. Mod() should never overflow. http://gerrit.cloudera.org:8080/#/c/8329/1/be/src/runtime/decimal-value.inline.h@514 PS1, Line 514: DCHECK( > DCHECK_LT(); Unfortunately this does not work with int128_t. -- To view, visit http://gerrit.cloudera.org:8080/8329 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5420201d4440d421e33e443df005cdcc16b8a6cd Gerrit-Change-Number: 8329 Gerrit-PatchSet: 1 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-Reviewer: anujphadke Gerrit-Comment-Date: Mon, 23 Oct 2017 23:19:35 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5018: Error on decimal modulo or divide by zero
Taras Bobrovytsky has posted comments on this change. ( http://gerrit.cloudera.org:8080/8344 ) Change subject: IMPALA-5018: Error on decimal modulo or divide by zero .. Patch Set 3: I spoke to Greg and Alex yesterday, and we agreed that decimal_v2 should always be in strict mode. This means that when decimal_v2 is enabled, we should always error on overflows and division by zero. This is the behavior that more traditional databases have. -- To view, visit http://gerrit.cloudera.org:8080/8344 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If7a7131e657fcdd293ade78d62f851dac0f1e3eb Gerrit-Change-Number: 8344 Gerrit-PatchSet: 3 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Sat, 21 Oct 2017 19:15:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5018: Error on decimal modulo or divide by zero
Taras Bobrovytsky has posted comments on this change. ( http://gerrit.cloudera.org:8080/8344 ) Change subject: IMPALA-5018: Error on decimal modulo or divide by zero .. Patch Set 3: Rebased. -- To view, visit http://gerrit.cloudera.org:8080/8344 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If7a7131e657fcdd293ade78d62f851dac0f1e3eb Gerrit-Change-Number: 8344 Gerrit-PatchSet: 3 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Sat, 21 Oct 2017 00:19:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5018: Error on decimal modulo or divide by zero
Taras Bobrovytsky has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/8344 ) Change subject: IMPALA-5018: Error on decimal modulo or divide by zero .. IMPALA-5018: Error on decimal modulo or divide by zero Before this patch, decimal operations would never produce an error. Division by and modulo zero would result in a NULL. In this patch, we change this behavior so that we raise an error instead of returning a NULL. We also modify the format of the decimal expr tests format to also include an error field. Testing: - Added several expr and end to end tests. Change-Id: If7a7131e657fcdd293ade78d62f851dac0f1e3eb --- M be/src/exprs/decimal-operators-ir.cc M be/src/exprs/expr-test.cc M testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test 3 files changed, 656 insertions(+), 565 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/8344/3 -- To view, visit http://gerrit.cloudera.org:8080/8344 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If7a7131e657fcdd293ade78d62f851dac0f1e3eb Gerrit-Change-Number: 8344 Gerrit-PatchSet: 3 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-5018: Error on decimal modulo or divide by zero
Taras Bobrovytsky has posted comments on this change. ( http://gerrit.cloudera.org:8080/8344 ) Change subject: IMPALA-5018: Error on decimal modulo or divide by zero .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/8344/2/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: http://gerrit.cloudera.org:8080/#/c/8344/2/be/src/exprs/expr-test.cc@1561 PS2, Line 1561: expected null, scaled_val, precision, scale for V1 : //expected null, scaled_val, precision, scale for V2 }} > that needs updating Done -- To view, visit http://gerrit.cloudera.org:8080/8344 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If7a7131e657fcdd293ade78d62f851dac0f1e3eb Gerrit-Change-Number: 8344 Gerrit-PatchSet: 2 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Sat, 21 Oct 2017 00:19:34 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5018: Error on decimal divide by and modulo zero
Taras Bobrovytsky has posted comments on this change. ( http://gerrit.cloudera.org:8080/8344 ) Change subject: IMPALA-5018: Error on decimal divide by and modulo zero .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/8344/1/be/src/exprs/decimal-operators-ir.cc File be/src/exprs/decimal-operators-ir.cc: http://gerrit.cloudera.org:8080/#/c/8344/1/be/src/exprs/decimal-operators-ir.cc@737 PS1, Line 737: Decimal division by or modulo zero > For either / or %, postgres reports: Done http://gerrit.cloudera.org:8080/#/c/8344/1/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: http://gerrit.cloudera.org:8080/#/c/8344/1/be/src/exprs/expr-test.cc@1567 PS1, Line 1567: { "cast(1.23 as decimal(8,2)) + cast(1 as decimal(20,3))", {{ false, false, 2230, 21, 3 }}}, > Long lines. Done http://gerrit.cloudera.org:8080/#/c/8344/1/be/src/exprs/expr-test.cc@1567 PS1, Line 1567: { "cast(1.23 as decimal(8,2)) + cast(1 as decimal(20,3))", {{ false, false, 2230, 21, 3 }}}, > Long lines. Done -- To view, visit http://gerrit.cloudera.org:8080/8344 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If7a7131e657fcdd293ade78d62f851dac0f1e3eb Gerrit-Change-Number: 8344 Gerrit-PatchSet: 1 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Fri, 20 Oct 2017 19:56:32 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5018: Error on decimal modulo or divide by zero
Taras Bobrovytsky has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/8344 ) Change subject: IMPALA-5018: Error on decimal modulo or divide by zero .. IMPALA-5018: Error on decimal modulo or divide by zero Before this patch, decimal operations would never produce an error. Division by and modulo zero would result in a NULL. In this patch, we change this behavior so that we raise an error instead of returning a NULL. We also modify the format of the decimal expr tests format to also include an error field. Testing: - Added several expr and end to end tests. Change-Id: If7a7131e657fcdd293ade78d62f851dac0f1e3eb --- M be/src/exprs/decimal-operators-ir.cc M be/src/exprs/expr-test.cc M testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test 3 files changed, 654 insertions(+), 563 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/8344/2 -- To view, visit http://gerrit.cloudera.org:8080/8344 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If7a7131e657fcdd293ade78d62f851dac0f1e3eb Gerrit-Change-Number: 8344 Gerrit-PatchSet: 2 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-5018: Error on decimal divide by and modulo zero
Taras Bobrovytsky has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8344 Change subject: IMPALA-5018: Error on decimal divide by and modulo zero .. IMPALA-5018: Error on decimal divide by and modulo zero Before this patch, decimal operations would never produce an error. Division by and modulo zero would result in a NULL. In this patch, we change this behavior so that we raise an error instead of returning a NULL. We also modify the format of the decimal expr tests format to also include an error field. Testing: - Added several expr and end to end tests. Change-Id: If7a7131e657fcdd293ade78d62f851dac0f1e3eb --- M be/src/exprs/decimal-operators-ir.cc M be/src/exprs/expr-test.cc M testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test 3 files changed, 619 insertions(+), 551 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/8344/1 -- To view, visit http://gerrit.cloudera.org:8080/8344 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: If7a7131e657fcdd293ade78d62f851dac0f1e3eb Gerrit-Change-Number: 8344 Gerrit-PatchSet: 1 Gerrit-Owner: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-4964: Fix Decimal modulo overflow
Taras Bobrovytsky has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8329 Change subject: IMPALA-4964: Fix Decimal modulo overflow .. IMPALA-4964: Fix Decimal modulo overflow The modulo operation between two decimals should never overflow. Before this patch, there would be an overflow if the scale difference between the two decimals was large. We would try to scale up the one with the smaller scale, so that the scales matched, which could result in an overflow. We fix the problem by first checking if the scaled up value would fit into 128 bits by estimating the number of leading zeros in it. If we detect that 128 bits is not enough, we convert both numbers to int256, do the operation, then convert back to 128 bits. Testing: - Added some expr tests that excercise the new code path. Change-Id: I5420201d4440d421e33e443df005cdcc16b8a6cd --- M be/src/exprs/expr-test.cc M be/src/runtime/decimal-test.cc M be/src/runtime/decimal-value.inline.h 3 files changed, 116 insertions(+), 26 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/8329/1 -- To view, visit http://gerrit.cloudera.org:8080/8329 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I5420201d4440d421e33e443df005cdcc16b8a6cd Gerrit-Change-Number: 8329 Gerrit-PatchSet: 1 Gerrit-Owner: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-5019: Decimal V2 addition
Taras Bobrovytsky has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/8309 ) Change subject: IMPALA-5019: Decimal V2 addition .. IMPALA-5019: Decimal V2 addition In this patch, we implement the new decimal return type rules for addition expressions. These rules become active when the query option DECIMAL_V2 is enabled. The algorithm for determining the type of the result is described in the JIRA. DECIMAL V1: ++ | typeof(cast(1 as decimal(38,0)) + cast(0.1 as decimal(38,38))) | ++ | DECIMAL(38,38) | ++ DECIMAL V2: ++ | typeof(cast(1 as decimal(38,0)) + cast(0.1 as decimal(38,38))) | ++ | DECIMAL(38,6) | ++ This patch required backend changes. We implement an algorithm where we handle the whole and fractional parts separately, and then combine them to get the final result. This is more complex and slower. We try to avoid this by first checking if the result would into int128. Testing: - Added expr tests. - Tested locally on my machine with a script that generates random decimal numbers and checks that Impala adds them correctly. Performance: For the common case, performance remains the same. select cast(2.2 as decimal(18, 1) + cast(2.2 as decimal(18, 1) BEFORE: 4.74s AFTER: 4.73s In this case, we check if it is necessary to do the complex addition, and it turns out to be not necessary. We see a slowdown because the result needs to be scaled down by dividing. select cast(2.2 as decimal(38, 19) + cast(2.2 as decimal(38, 19) BEFORE: 1.63s AFTER: 13.57s In following case, we take the most complex path and see the most signification perfmance hit. select cast(7.5 as decimal(38,37)) + cast(2.2 as decimal(38,37)) BEFORE: 1.63s AFTER: 20.57 Change-Id: I401049c56d910eb1546a178c909c923b01239336 --- M be/src/exprs/expr-test.cc M be/src/runtime/decimal-value.inline.h M fe/src/main/java/org/apache/impala/analysis/TypesUtil.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java 4 files changed, 297 insertions(+), 82 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/8309/2 -- To view, visit http://gerrit.cloudera.org:8080/8309 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I401049c56d910eb1546a178c909c923b01239336 Gerrit-Change-Number: 8309 Gerrit-PatchSet: 2 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-5019: Decimal V2 addition
Taras Bobrovytsky has posted comments on this change. ( http://gerrit.cloudera.org:8080/8309 ) Change subject: IMPALA-5019: Decimal V2 addition .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/8309/1/fe/src/main/java/org/apache/impala/analysis/TypesUtil.java File fe/src/main/java/org/apache/impala/analysis/TypesUtil.java: http://gerrit.cloudera.org:8080/#/c/8309/1/fe/src/main/java/org/apache/impala/analysis/TypesUtil.java@268 PS1, Line 268: // Not yet implemented - fall back to V1 rules. I think this can now be removed now. -- To view, visit http://gerrit.cloudera.org:8080/8309 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I401049c56d910eb1546a178c909c923b01239336 Gerrit-Change-Number: 8309 Gerrit-PatchSet: 1 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Wed, 18 Oct 2017 02:30:22 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5019: Decimal V2 addition
Taras Bobrovytsky has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8309 Change subject: IMPALA-5019: Decimal V2 addition .. IMPALA-5019: Decimal V2 addition In this patch, we implement the new decimal return type rules for addition expressions. These rules become active when the query option DECIMAL_V2 is enabled. The algorithm for determining the type of the result is described in the JIRA. DECIMAL V1: ++ | typeof(cast(1 as decimal(38,0)) + cast(0.1 as decimal(38,38))) | ++ | DECIMAL(38,38) | ++ DECIMAL V2: ++ | typeof(cast(1 as decimal(38,0)) + cast(0.1 as decimal(38,38))) | ++ | DECIMAL(38,6) | ++ This patch required backend changes. We implement an algorithm where we handle the whole and fractional parts separately, and then combine them to get the final result. This is more complex and slower. We try to avoid this by first checking if the result would into int128. Testing: - Added expr tests. - Tested locally on my machine with a script that generates random decimal numbers and checks that Impala adds them correctly. Performance: For the common case, performance remains the same. select cast(2.2 as decimal(18, 1) + cast(2.2 as decimal(18, 1) BEFORE: 4.74s AFTER: 4.73s In this case, we check if it is necessary to do the complex addition, and it turns out to be not necessary. We see a slowdown because the result needs to be scaled down by dividing. select cast(2.2 as decimal(38, 19) + cast(2.2 as decimal(38, 19) BEFORE: 1.63s AFTER: 13.57s In following case, we take the most complex path and see the most signification perfmance hit. select cast(7.5 as decimal(38,37)) + cast(2.2 as decimal(38,37)) BEFORE: 1.63s AFTER: 20.57 Change-Id: I401049c56d910eb1546a178c909c923b01239336 --- M be/src/exprs/expr-test.cc M be/src/runtime/decimal-value.inline.h M fe/src/main/java/org/apache/impala/analysis/TypesUtil.java 3 files changed, 282 insertions(+), 69 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/8309/1 -- To view, visit http://gerrit.cloudera.org:8080/8309 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I401049c56d910eb1546a178c909c923b01239336 Gerrit-Change-Number: 8309 Gerrit-PatchSet: 1 Gerrit-Owner: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-4848: Add WIDTH BUCKET() function
Taras Bobrovytsky has posted comments on this change. ( http://gerrit.cloudera.org:8080/6023 ) Change subject: IMPALA-4848: Add WIDTH_BUCKET() function .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/6023/7/be/src/exprs/math-functions-ir.cc File be/src/exprs/math-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/6023/7/be/src/exprs/math-functions-ir.cc@516 PS7, Line 516: int256_t x = ConvertToInt256(buckets.value()) * ConvertToInt256(width_size.value()); > Sure, that's true. I don't think that this function is meant to be used tha This idea may give a nice performance boost (if it works) because all the heavy lifting (such as dividing and maybe converting to int256) is done once at the beginning when we are constructing the array. The array will contain values that have the same precision and scale as the input expression so all we have to do is O(log(number of buckets)) comparisons of expr to array elements in case of binary search. Or O(number of buckets) comparisons if number of buckets is small. If number of buckets is very large, we can fall back to the current implementation. (we can have 2 versions WidthBucketImpl such as WidthBucketImplSmall WidthBucketImplLarge) -- To view, visit http://gerrit.cloudera.org:8080/6023 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I081bc916b1bef7b929ca161a9aade3b54c6b858f Gerrit-Change-Number: 6023 Gerrit-PatchSet: 7 Gerrit-Owner: anujphadkeGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: anujphadke Gerrit-Comment-Date: Tue, 10 Oct 2017 00:49:40 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4848: Add WIDTH BUCKET() function
Taras Bobrovytsky has posted comments on this change. ( http://gerrit.cloudera.org:8080/6023 ) Change subject: IMPALA-4848: Add WIDTH_BUCKET() function .. Patch Set 7: (4 comments) http://gerrit.cloudera.org:8080/#/c/6023/7/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: http://gerrit.cloudera.org:8080/#/c/6023/7/be/src/exprs/expr-test.cc@4247 PS7, Line 4247: TestValue("width_bucket(6.3, 2, 17, 2)", TYPE_BIGINT, 1); You should include a case like (1, 1, 6, 3) http://gerrit.cloudera.org:8080/#/c/6023/7/be/src/exprs/math-functions-ir.cc File be/src/exprs/math-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/6023/7/be/src/exprs/math-functions-ir.cc@429 PS7, Line 429: bucket_width This should be called bucket_number to make it more clear http://gerrit.cloudera.org:8080/#/c/6023/7/be/src/exprs/math-functions-ir.cc@431 PS7, Line 431: width_size width_size is a confusing name. This should be called something like "distance_from_min". http://gerrit.cloudera.org:8080/#/c/6023/7/be/src/exprs/math-functions-ir.cc@516 PS7, Line 516: int256_t x = ConvertToInt256(buckets.value()) * ConvertToInt256(width_size.value()); > Won't this require a lot of memory to create such a array? Sure, that's true. I don't think that this function is meant to be used that way though. It's difficult to imagine someone wanting to use over 1000 buckets. In that case, we could fall back to the current implementation. -- To view, visit http://gerrit.cloudera.org:8080/6023 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I081bc916b1bef7b929ca161a9aade3b54c6b858f Gerrit-Change-Number: 6023 Gerrit-PatchSet: 7 Gerrit-Owner: anujphadkeGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: anujphadke Gerrit-Comment-Date: Tue, 10 Oct 2017 00:41:56 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4939, IMPALA-4940: Decimal V2 multiplication
Taras Bobrovytsky has posted comments on this change. ( http://gerrit.cloudera.org:8080/7438 ) Change subject: IMPALA-4939, IMPALA-4940: Decimal V2 multiplication .. Patch Set 12: Code-Review+2 Fixed the uninitialized variable bug in expr-test.cc. Forwarding the +2. Thanks Jim! -- To view, visit http://gerrit.cloudera.org:8080/7438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 Gerrit-Change-Number: 7438 Gerrit-PatchSet: 12 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Wed, 04 Oct 2017 05:26:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4939, IMPALA-4940: Decimal V2 multiplication
Taras Bobrovytsky has uploaded a new patch set (#12). ( http://gerrit.cloudera.org:8080/7438 ) Change subject: IMPALA-4939, IMPALA-4940: Decimal V2 multiplication .. IMPALA-4939, IMPALA-4940: Decimal V2 multiplication Implement the new DECIMAL return type rules for multiply expressions, active when query option DECIMAL_V2=1. The algorithm for determining the type of the result of multiplication is described in the JIRA. DECIMAL V1: +---+ | typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) | +---+ | DECIMAL(38,38)| +---+ +---+ | typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) | +---+ | DECIMAL(38,30)| +---+ DECIMAL V2: +---+ | typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) | +---+ | DECIMAL(38,37)| +---+ +---+ | typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) | +---+ | DECIMAL(38,6) | +---+ In this patch, we also fix the early multiplication overflow. We compute a 256 bit integer intermediate value, which we then attempt to scale down and round. Performance: I ran TPCH 300 and TPCDS 1000 workloads and the performance is almost identical. For TPCH Q1, there was an improvement from 21 seconds to 16 seconds. I did not see any regressions. The performance improvement is due to the way we check for overflows after this patch (by counting the leading zeros instead of dividing). It can be clealy seen in this query: select cast(2.2 as decimal(38, 1)) * cast(2.2 as decimal(38, 1)) before: 7.85s after: 2.03s I noticed performance regressions in the following cases: - When we need to convert to a 256 bit integer before multiplying, which was introduced in this patch. Whether this happens depends on the resulting precision and the value of the inputs. In the following extreme case, the intermediate value is converted to a 256 bit integer every time. select cast(1.1 as decimal(38, 37)) * cast(1.1 as decimal(38, 37)) before: 14.56s (returns null) after: 126.17s - When we need to scale down the intermediate value. In the following query the result is decimal(38,6) after the patch, so the intermediate needs to be scaled down. select cast(2.2 as decimal(38,1)) * cast(2.2 as decimal(38,19)) before: 7.25s after: 13.06s These regressions are possible only when the resulting precision is 38 which is not common in typical workloads. Note: The actual queries that I ran for the benchmark are not exactly as above. I constructed tables with millions of rows with those values. I ran the queries with DECIMAL_v2=1 option before and after the patch. Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 --- M be/src/exprs/expr-test.cc M be/src/runtime/decimal-value.inline.h M be/src/util/bit-util.h M fe/src/main/java/org/apache/impala/analysis/TypesUtil.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java 5 files changed, 333 insertions(+), 62 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/7438/12 -- To view, visit http://gerrit.cloudera.org:8080/7438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 Gerrit-Change-Number: 7438 Gerrit-PatchSet: 12 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-4939, IMPALA-4940: Decimal V2 multiplication
Taras Bobrovytsky has posted comments on this change. ( http://gerrit.cloudera.org:8080/7438 ) Change subject: IMPALA-4939, IMPALA-4940: Decimal V2 multiplication .. Patch Set 11: Code-Review+2 Updated FE tests. Forwarding the +2. -- To view, visit http://gerrit.cloudera.org:8080/7438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 Gerrit-Change-Number: 7438 Gerrit-PatchSet: 11 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Wed, 04 Oct 2017 02:49:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4939, IMPALA-4940: Decimal V2 multiplication
Taras Bobrovytsky has uploaded a new patch set (#11). ( http://gerrit.cloudera.org:8080/7438 ) Change subject: IMPALA-4939, IMPALA-4940: Decimal V2 multiplication .. IMPALA-4939, IMPALA-4940: Decimal V2 multiplication Implement the new DECIMAL return type rules for multiply expressions, active when query option DECIMAL_V2=1. The algorithm for determining the type of the result of multiplication is described in the JIRA. DECIMAL V1: +---+ | typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) | +---+ | DECIMAL(38,38)| +---+ +---+ | typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) | +---+ | DECIMAL(38,30)| +---+ DECIMAL V2: +---+ | typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) | +---+ | DECIMAL(38,37)| +---+ +---+ | typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) | +---+ | DECIMAL(38,6) | +---+ In this patch, we also fix the early multiplication overflow. We compute a 256 bit integer intermediate value, which we then attempt to scale down and round. Performance: I ran TPCH 300 and TPCDS 1000 workloads and the performance is almost identical. For TPCH Q1, there was an improvement from 21 seconds to 16 seconds. I did not see any regressions. The performance improvement is due to the way we check for overflows after this patch (by counting the leading zeros instead of dividing). It can be clealy seen in this query: select cast(2.2 as decimal(38, 1)) * cast(2.2 as decimal(38, 1)) before: 7.85s after: 2.03s I noticed performance regressions in the following cases: - When we need to convert to a 256 bit integer before multiplying, which was introduced in this patch. Whether this happens depends on the resulting precision and the value of the inputs. In the following extreme case, the intermediate value is converted to a 256 bit integer every time. select cast(1.1 as decimal(38, 37)) * cast(1.1 as decimal(38, 37)) before: 14.56s (returns null) after: 126.17s - When we need to scale down the intermediate value. In the following query the result is decimal(38,6) after the patch, so the intermediate needs to be scaled down. select cast(2.2 as decimal(38,1)) * cast(2.2 as decimal(38,19)) before: 7.25s after: 13.06s These regressions are possible only when the resulting precision is 38 which is not common in typical workloads. Note: The actual queries that I ran for the benchmark are not exactly as above. I constructed tables with millions of rows with those values. I ran the queries with DECIMAL_v2=1 option before and after the patch. Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 --- M be/src/exprs/expr-test.cc M be/src/runtime/decimal-value.inline.h M be/src/util/bit-util.h M fe/src/main/java/org/apache/impala/analysis/TypesUtil.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java 5 files changed, 333 insertions(+), 62 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/7438/11 -- To view, visit http://gerrit.cloudera.org:8080/7438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 Gerrit-Change-Number: 7438 Gerrit-PatchSet: 11 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer
Taras Bobrovytsky has posted comments on this change. ( http://gerrit.cloudera.org:8080/7061 ) Change subject: IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer .. Patch Set 11: The run failed due to IMPALA-6009. Rebase and try again. -- To view, visit http://gerrit.cloudera.org:8080/7061 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255 Gerrit-Change-Number: 7061 Gerrit-PatchSet: 11 Gerrit-Owner: John ShermanGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: John Sherman Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Comment-Date: Tue, 03 Oct 2017 22:36:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4939, IMPALA-4940: Decimal V2 multiplication
Taras Bobrovytsky has posted comments on this change. ( http://gerrit.cloudera.org:8080/7438 ) Change subject: IMPALA-4939, IMPALA-4940: Decimal V2 multiplication .. Patch Set 10: Code-Review+2 Rebased, forwarding the +2 -- To view, visit http://gerrit.cloudera.org:8080/7438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 Gerrit-Change-Number: 7438 Gerrit-PatchSet: 10 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Tue, 03 Oct 2017 21:51:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4939, IMPALA-4940: Decimal V2 multiplication
Taras Bobrovytsky has uploaded a new patch set (#10). ( http://gerrit.cloudera.org:8080/7438 ) Change subject: IMPALA-4939, IMPALA-4940: Decimal V2 multiplication .. IMPALA-4939, IMPALA-4940: Decimal V2 multiplication Implement the new DECIMAL return type rules for multiply expressions, active when query option DECIMAL_V2=1. The algorithm for determining the type of the result of multiplication is described in the JIRA. DECIMAL V1: +---+ | typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) | +---+ | DECIMAL(38,38)| +---+ +---+ | typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) | +---+ | DECIMAL(38,30)| +---+ DECIMAL V2: +---+ | typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) | +---+ | DECIMAL(38,37)| +---+ +---+ | typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) | +---+ | DECIMAL(38,6) | +---+ In this patch, we also fix the early multiplication overflow. We compute a 256 bit integer intermediate value, which we then attempt to scale down and round. Performance: I ran TPCH 300 and TPCDS 1000 workloads and the performance is almost identical. For TPCH Q1, there was an improvement from 21 seconds to 16 seconds. I did not see any regressions. The performance improvement is due to the way we check for overflows after this patch (by counting the leading zeros instead of dividing). It can be clealy seen in this query: select cast(2.2 as decimal(38, 1)) * cast(2.2 as decimal(38, 1)) before: 7.85s after: 2.03s I noticed performance regressions in the following cases: - When we need to convert to a 256 bit integer before multiplying, which was introduced in this patch. Whether this happens depends on the resulting precision and the value of the inputs. In the following extreme case, the intermediate value is converted to a 256 bit integer every time. select cast(1.1 as decimal(38, 37)) * cast(1.1 as decimal(38, 37)) before: 14.56s (returns null) after: 126.17s - When we need to scale down the intermediate value. In the following query the result is decimal(38,6) after the patch, so the intermediate needs to be scaled down. select cast(2.2 as decimal(38,1)) * cast(2.2 as decimal(38,19)) before: 7.25s after: 13.06s These regressions are possible only when the resulting precision is 38 which is not common in typical workloads. Note: The actual queries that I ran for the benchmark are not exactly as above. I constructed tables with millions of rows with those values. I ran the queries with DECIMAL_v2=1 option before and after the patch. Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 --- M be/src/exprs/expr-test.cc M be/src/runtime/decimal-value.inline.h M be/src/util/bit-util.h M fe/src/main/java/org/apache/impala/analysis/TypesUtil.java 4 files changed, 330 insertions(+), 59 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/7438/10 -- To view, visit http://gerrit.cloudera.org:8080/7438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 Gerrit-Change-Number: 7438 Gerrit-PatchSet: 10 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-4939, IMPALA-4940: Decimal V2 multiplication
Taras Bobrovytsky has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/7438 ) Change subject: IMPALA-4939, IMPALA-4940: Decimal V2 multiplication .. IMPALA-4939, IMPALA-4940: Decimal V2 multiplication Implement the new DECIMAL return type rules for multiply expressions, active when query option DECIMAL_V2=1. The algorithm for determining the type of the result of multiplication is described in the JIRA. DECIMAL V1: +---+ | typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) | +---+ | DECIMAL(38,38)| +---+ +---+ | typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) | +---+ | DECIMAL(38,30)| +---+ DECIMAL V2: +---+ | typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) | +---+ | DECIMAL(38,37)| +---+ +---+ | typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) | +---+ | DECIMAL(38,6) | +---+ In this patch, we also fix the early multiplication overflow. We compute a 256 bit integer intermediate value, which we then attempt to scale down and round. Performance: I ran TPCH 300 and TPCDS 1000 workloads and the performance is almost identical. For TPCH Q1, there was an improvement from 21 seconds to 16 seconds. I did not see any regressions. The performance improvement is due to the way we check for overflows after this patch (by counting the leading zeros instead of dividing). It can be clealy seen in this query: select cast(2.2 as decimal(38, 1)) * cast(2.2 as decimal(38, 1)) before: 7.85s after: 2.03s I noticed performance regressions in the following cases: - When we need to convert to a 256 bit integer before multiplying, which was introduced in this patch. Whether this happens depends on the resulting precision and the value of the inputs. In the following extreme case, the intermediate value is converted to a 256 bit integer every time. select cast(1.1 as decimal(38, 37)) * cast(1.1 as decimal(38, 37)) before: 14.56s (returns null) after: 126.17s - When we need to scale down the intermediate value. In the following query the result is decimal(38,6) after the patch, so the intermediate needs to be scaled down. select cast(2.2 as decimal(38,1)) * cast(2.2 as decimal(38,19)) before: 7.25s after: 13.06s These regressions are possible only when the resulting precision is 38 which is not common in typical workloads. Note: The actual queries that I ran for the benchmark are not exactly as above. I constructed tables with millions of rows with those values. I ran the queries with DECIMAL_v2=1 option before and after the patch. Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 --- M be/src/exprs/expr-test.cc M be/src/runtime/decimal-value.inline.h M be/src/util/bit-util.h M fe/src/main/java/org/apache/impala/analysis/TypesUtil.java 4 files changed, 330 insertions(+), 59 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/7438/9 -- To view, visit http://gerrit.cloudera.org:8080/7438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 Gerrit-Change-Number: 7438 Gerrit-PatchSet: 9 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-4939, IMPALA-4940: Decimal V2 multiplication
Taras Bobrovytsky has posted comments on this change. ( http://gerrit.cloudera.org:8080/7438 ) Change subject: IMPALA-4939, IMPALA-4940: Decimal V2 multiplication .. Patch Set 8: (4 comments) http://gerrit.cloudera.org:8080/#/c/7438/8/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: http://gerrit.cloudera.org:8080/#/c/7438/8/be/src/exprs/expr-test.cc@1672 PS8, Line 1672: not require int256. > if result does not require int256, why does it overflow (return null)? Added comment. http://gerrit.cloudera.org:8080/#/c/7438/8/be/src/exprs/expr-test.cc@1675 PS8, Line 1675: { true, 0, 38, 0 }}}, > when both V1 and V2 give the same answer, you don't have to include both re Done http://gerrit.cloudera.org:8080/#/c/7438/8/be/src/exprs/expr-test.cc@1680 PS8, Line 1680: { "cast(18446744073709551615 as decimal(38,0)) * cast(9223372036854775808 as decimal(38,0))", > It might be helpful if you mentioned how these values where computed in the Added comment. http://gerrit.cloudera.org:8080/#/c/7438/8/be/src/exprs/expr-test.cc@1702 PS8, Line 1702: 85070591730234615865843651857942052864 > how did you verify these results? I used Python to make sure I get the same results. -- To view, visit http://gerrit.cloudera.org:8080/7438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 Gerrit-Change-Number: 7438 Gerrit-PatchSet: 8 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Tue, 03 Oct 2017 21:26:14 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4939, IMPALA-4940: Decimal V2 multiplication
Taras Bobrovytsky has posted comments on this change. ( http://gerrit.cloudera.org:8080/7438 ) Change subject: IMPALA-4939, IMPALA-4940: Decimal V2 multiplication .. Patch Set 8: I made some additional changes in patch 8 as a result of thinking about your comment. Dan, can you take a look one more time? -- To view, visit http://gerrit.cloudera.org:8080/7438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 Gerrit-Change-Number: 7438 Gerrit-PatchSet: 8 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Tue, 03 Oct 2017 02:55:19 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4939, IMPALA-4940: Decimal V2 multiplication
Taras Bobrovytsky has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/7438 ) Change subject: IMPALA-4939, IMPALA-4940: Decimal V2 multiplication .. IMPALA-4939, IMPALA-4940: Decimal V2 multiplication Implement the new DECIMAL return type rules for multiply expressions, active when query option DECIMAL_V2=1. The algorithm for determining the type of the result of multiplication is described in the JIRA. DECIMAL V1: +---+ | typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) | +---+ | DECIMAL(38,38)| +---+ +---+ | typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) | +---+ | DECIMAL(38,30)| +---+ DECIMAL V2: +---+ | typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) | +---+ | DECIMAL(38,37)| +---+ +---+ | typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) | +---+ | DECIMAL(38,6) | +---+ In this patch, we also fix the early multiplication overflow. We compute a 256 bit integer intermediate value, which we then attempt to scale down and round. Performance: I ran TPCH 300 and TPCDS 1000 workloads and the performance is almost identical. For TPCH Q1, there was an improvement from 21 seconds to 16 seconds. I did not see any regressions. The performance improvement is due to the way we check for overflows after this patch (by counting the leading zeros instead of dividing). It can be clealy seen in this query: select cast(2.2 as decimal(38, 1)) * cast(2.2 as decimal(38, 1)) before: 7.85s after: 2.03s I noticed performance regressions in the following cases: - When we need to convert to a 256 bit integer before multiplying, which was introduced in this patch. Whether this happens depends on the resulting precision and the value of the inputs. In the following extreme case, the intermediate value is converted to a 256 bit integer every time. select cast(1.1 as decimal(38, 37)) * cast(1.1 as decimal(38, 37)) before: 14.56s (returns null) after: 126.17s - When we need to scale down the intermediate value. In the following query the result is decimal(38,6) after the patch, so the intermediate needs to be scaled down. select cast(2.2 as decimal(38,1)) * cast(2.2 as decimal(38,19)) before: 7.25s after: 13.06s These regressions are possible only when the resulting precision is 38 which is not common in typical workloads. Note: The actual queries that I ran for the benchmark are not exactly as above. I constructed tables with millions of rows with those values. I ran the queries with DECIMAL_v2=1 option before and after the patch. Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 --- M be/src/exprs/expr-test.cc M be/src/runtime/decimal-value.inline.h M be/src/util/bit-util.h M fe/src/main/java/org/apache/impala/analysis/TypesUtil.java 4 files changed, 332 insertions(+), 59 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/7438/8 -- To view, visit http://gerrit.cloudera.org:8080/7438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 Gerrit-Change-Number: 7438 Gerrit-PatchSet: 8 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-4939, IMPALA-4940: Decimal V2 multiplication
Taras Bobrovytsky has posted comments on this change. ( http://gerrit.cloudera.org:8080/7438 ) Change subject: IMPALA-4939, IMPALA-4940: Decimal V2 multiplication .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/7438/7/be/src/runtime/decimal-value.inline.h File be/src/runtime/decimal-value.inline.h: http://gerrit.cloudera.org:8080/#/c/7438/7/be/src/runtime/decimal-value.inline.h@249 PS7, Line 249: } else { > maybe a quick comment: Yes, your understanding is correct. Added comment. -- To view, visit http://gerrit.cloudera.org:8080/7438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 Gerrit-Change-Number: 7438 Gerrit-PatchSet: 7 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Tue, 03 Oct 2017 02:53:19 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4939, IMPALA-4940: Decimal V2 multiplication
Taras Bobrovytsky has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/7438 ) Change subject: IMPALA-4939, IMPALA-4940: Decimal V2 multiplication .. IMPALA-4939, IMPALA-4940: Decimal V2 multiplication Implement the new DECIMAL return type rules for multiply expressions, active when query option DECIMAL_V2=1. The algorithm for determining the type of the result of multiplication is described in the JIRA. DECIMAL V1: +---+ | typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) | +---+ | DECIMAL(38,38)| +---+ +---+ | typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) | +---+ | DECIMAL(38,30)| +---+ DECIMAL V2: +---+ | typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) | +---+ | DECIMAL(38,37)| +---+ +---+ | typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) | +---+ | DECIMAL(38,6) | +---+ In this patch, we also fix the early multiplication overflow. We compute a 256 bit integer intermediate value, which we then attempt to scale down and round. Performance: I ran TPCH 300 and TPCDS 1000 workloads and the performance is almost identical. For TPCH Q1, there was an improvement from 21 seconds to 16 seconds. I did not see any regressions. The performance improvement is due to the way we check for overflows after this patch (by counting the leading zeros instead of dividing). It can be clealy seen in this query: select cast(2.2 as decimal(38, 1)) * cast(2.2 as decimal(38, 1)) before: 7.85s after: 2.03s I noticed performance regressions in the following cases: - When we need to convert to a 256 bit integer before multiplying, which was introduced in this patch. Whether this happens depends on the resulting precision and the value of the inputs. In the following extreme case, the intermediate value is converted to a 256 bit integer every time. select cast(1.1 as decimal(38, 37)) * cast(1.1 as decimal(38, 37)) before: 14.56s (returns null) after: 126.17s - When we need to scale down the intermediate value. In the following query the result is decimal(38,6) after the patch, so the intermediate needs to be scaled down. select cast(2.2 as decimal(38,1)) * cast(2.2 as decimal(38,19)) before: 7.25s after: 13.06s These regressions are possible only when the resulting precision is 38 which is not common in typical workloads. Note: The actual queries that I ran for the benchmark are not exactly as above. I constructed tables with millions of rows with those values. I ran the queries with DECIMAL_v2=1 option before and after the patch. Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 --- M be/src/exprs/expr-test.cc M be/src/runtime/decimal-value.inline.h M be/src/util/bit-util.h M fe/src/main/java/org/apache/impala/analysis/TypesUtil.java 4 files changed, 292 insertions(+), 59 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/7438/7 -- To view, visit http://gerrit.cloudera.org:8080/7438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 Gerrit-Change-Number: 7438 Gerrit-PatchSet: 7 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-4939, IMPALA-4940: Decimal V2 multiplication
Taras Bobrovytsky has posted comments on this change. ( http://gerrit.cloudera.org:8080/7438 ) Change subject: IMPALA-4939, IMPALA-4940: Decimal V2 multiplication .. Patch Set 6: I think we should explore your suggestion to try SEE/AVX in a follow on patch. (to optimze both multiplication and division if it's possible) -- To view, visit http://gerrit.cloudera.org:8080/7438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 Gerrit-Change-Number: 7438 Gerrit-PatchSet: 6 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Wed, 27 Sep 2017 01:24:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4939, IMPALA-4940: Decimal V2 multiplication
Taras Bobrovytsky has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/7438 ) Change subject: IMPALA-4939, IMPALA-4940: Decimal V2 multiplication .. IMPALA-4939, IMPALA-4940: Decimal V2 multiplication Implement the new DECIMAL return type rules for multiply expressions, active when query option DECIMAL_V2=1. The algorithm for determining the type of the result of multiplication is described in the JIRA. DECIMAL V1: +---+ | typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) | +---+ | DECIMAL(38,38)| +---+ +---+ | typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) | +---+ | DECIMAL(38,30)| +---+ DECIMAL V2: +---+ | typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) | +---+ | DECIMAL(38,37)| +---+ +---+ | typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) | +---+ | DECIMAL(38,6) | +---+ In this patch, we also fix the early multiplication overflow. We compute a 256 bit integer intermediate value, which we then attempt to scale down and round. Performance: I ran TPCH 300 and TPCDS 1000 workloads and the performance is almost identical. For TPCH Q1, there was an improvement from 21 seconds to 16 seconds. I did not see any regressions. The performance improvement is due to the way we check for overflows after this patch (by counting the leading zeros instead of dividing). It can be clealy seen in this query: select cast(2.2 as decimal(38, 1)) * cast(2.2 as decimal(38, 1)) before: 7.85s after: 2.03s I noticed performance regressions in the following cases: - When we need to convert to a 256 bit integer before multiplying, which was introduced in this patch. Whether this happens depends on the resulting precision and the value of the inputs. In the following extreme case, the intermediate value is converted to a 256 bit integer every time. select cast(1.1 as decimal(38, 37)) * cast(1.1 as decimal(38, 37)) before: 14.56s (returns null) after: 126.17s - When we need to scale down the intermediate value. In the following query the result is decimal(38,6) after the patch, so the intermediate needs to be scaled down. select cast(2.2 as decimal(38,1)) * cast(2.2 as decimal(38,19)) before: 7.25s after: 13.06s These regressions are possible only when the resulting precision is 38 which is not common in typical workloads. Note: The actual queries that I ran for the benchmark are not exactly as above. I constructed tables with millions of rows with those values. I ran the queries with DECIMAL_v2=1 option before and after the patch. Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 --- M be/src/exprs/expr-test.cc M be/src/runtime/decimal-value.inline.h M be/src/util/bit-util.h M fe/src/main/java/org/apache/impala/analysis/TypesUtil.java 4 files changed, 289 insertions(+), 59 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/7438/6 -- To view, visit http://gerrit.cloudera.org:8080/7438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 Gerrit-Change-Number: 7438 Gerrit-PatchSet: 6 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-4939, IMPALA-4940: Decimal V2 multiplication
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-4939, IMPALA-4940: Decimal V2 multiplication .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/7438/5/be/src/runtime/decimal-value.inline.h File be/src/runtime/decimal-value.inline.h: Line 203: RESULT_T result = value / multiplier; > Another idea: there are a finite number of possible delta_scale values. It After trying different approaches, this one produced the best results: http://github.mtv.cloudera.com/Taras/Impala/commit/950892a48e555ba24dcca772851d251ae3ecb17f Before: 66.13s After: 55.63s So, a small improvement. Still not that great: without the multiplication patch, it took 8.26s -- To view, visit http://gerrit.cloudera.org:8080/7438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4939, IMPALA-4940: Decimal V2 multiplication
Taras Bobrovytsky has uploaded a new patch set (#5). Change subject: IMPALA-4939, IMPALA-4940: Decimal V2 multiplication .. IMPALA-4939, IMPALA-4940: Decimal V2 multiplication Implement the new DECIMAL return type rules for multiply expressions, active when query option DECIMAL_V2=1. The algorithm for determining the type of the result of multiplication is described in the JIRA. DECIMAL V1: +---+ | typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) | +---+ | DECIMAL(38,38)| +---+ +---+ | typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) | +---+ | DECIMAL(38,30)| +---+ DECIMAL V2: +---+ | typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) | +---+ | DECIMAL(38,37)| +---+ +---+ | typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) | +---+ | DECIMAL(38,6) | +---+ In this patch, we also fix the early multiplication overflow. We compute an int256 intermediate value, which we then attempt to scale down and round to int128. Benchmarks: In the following benchmarks, we are selecting from lineitem_big, which has about 100 times as many rows as the normal lineitem. Query: select sum(l_quantity * l_tax) + sum(l_extendedprice * l_discount) from lineitem_big; Before: DECIMAL_V2 disabled: 2.24s DECIMAL_V2 enabled : 2.14 After: DECIMAL_V2 disabled: 2.66s DECIMAL_V2 enabled : 2.25s Query: select sum(l_quantity * l_tax * cast(1 as decimal(38, 0))) + sum(l_extendedprice * l_discount * cast(1 as decimal(38, 0))) from lineitem_big Before: DECIMAL_V2 disabled: 2.34s DECIMAL_V2 enabled : 2.36s After: DECIMAL_V2 disabled: 4.25s DECIMAL_V2 enabled : 4.15s Query: select sum(l_quantity * l_tax * cast(1 as decimal(38, 37))) + sum(l_extendedprice * l_discount * cast(1 as decimal(38, 37))) from lineitem_big Before: DECIMAL_V2 disabled: 2.84s DECIMAL_V2 enabled : 8.26s After: DECIMAL_V2 disabled: 69.16s DECIMAL_V2 enabled : 66.13s Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 --- M be/src/exprs/expr-test.cc M be/src/runtime/decimal-value.inline.h M be/src/util/bit-util.h M fe/src/main/java/org/apache/impala/analysis/TypesUtil.java 4 files changed, 282 insertions(+), 60 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/7438/5 -- To view, visit http://gerrit.cloudera.org:8080/7438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-4939, IMPALA-4939: Decimal V2 multiplication
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-4939, IMPALA-4939: Decimal V2 multiplication .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/7438/4//COMMIT_MSG Commit Message: Line 7: IMPALA-4939, IMPALA-4939: Decimal V2 multiplication > The same JIRA is listed twice. Done http://gerrit.cloudera.org:8080/#/c/7438/1/be/src/runtime/decimal-value.inline.h File be/src/runtime/decimal-value.inline.h: Line 301: // The following is frought with apparent difficulty, as there is only 1 bit > It seems most efficient development-wise if we give this a try now while we Implemented checking the leading zeros optimization. It doesn't seem to have a big impact on the performance of the queries I tried. -- To view, visit http://gerrit.cloudera.org:8080/7438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4939, IMPALA-4940: Decimal V2 multiplication
Taras Bobrovytsky has uploaded a new patch set (#5). Change subject: IMPALA-4939, IMPALA-4940: Decimal V2 multiplication .. IMPALA-4939, IMPALA-4940: Decimal V2 multiplication Implement the new DECIMAL return type rules for multiply expressions, active when query option DECIMAL_V2=1. The algorithm for determining the type of the result of multiplication is described in the JIRA. DECIMAL V1: +---+ | typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) | +---+ | DECIMAL(38,38)| +---+ +---+ | typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) | +---+ | DECIMAL(38,30)| +---+ DECIMAL V2: +---+ | typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) | +---+ | DECIMAL(38,37)| +---+ +---+ | typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) | +---+ | DECIMAL(38,6) | +---+ In this patch, we also fix the early multiplication overflow. We compute an int256 intermediate value, which we then attempt to scale down and round to int128. Benchmarks: In the following benchmarks, we are selecting from lineitem_big, which has about 100 times as many rows as the normal lineitem. Query: select sum(l_quantity * l_tax) + sum(l_extendedprice * l_discount) from lineitem_big; Before: DECIMAL_V2 disabled: 2.24s DECIMAL_V2 enabled : 2.14 After: DECIMAL_V2 disabled: 2.66s DECIMAL_V2 enabled : 2.25s Query: select sum(l_quantity * l_tax * cast(1 as decimal(38, 0))) + sum(l_extendedprice * l_discount * cast(1 as decimal(38, 0))) from lineitem_big Before: DECIMAL_V2 disabled: 2.34s DECIMAL_V2 enabled : 2.36s After: DECIMAL_V2 disabled: 4.25s DECIMAL_V2 enabled : 4.15s Query: select sum(l_quantity * l_tax * cast(1 as decimal(38, 37))) + sum(l_extendedprice * l_discount * cast(1 as decimal(38, 37))) from lineitem_big Before: DECIMAL_V2 disabled: 2.84s DECIMAL_V2 enabled : 8.26s After: DECIMAL_V2 disabled: 69.16s DECIMAL_V2 enabled : 66.13s Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 --- M be/src/exprs/expr-test.cc M be/src/runtime/decimal-value.inline.h M be/src/util/bit-util.h M fe/src/main/java/org/apache/impala/analysis/TypesUtil.java 4 files changed, 282 insertions(+), 60 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/7438/5 -- To view, visit http://gerrit.cloudera.org:8080/7438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-4939, IMPALA-4939: Decimal V2 multiplication
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-4939, IMPALA-4939: Decimal V2 multiplication .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/7438/4//COMMIT_MSG Commit Message: Line 75: DECIMAL_V2 disabled: 2.65s > So we go down this path for multiplies that overflow, or are close to overf Running perf top showed that most of the time is spent in ScaleDownAndRound(). It looks like it's the division (of one int256 by another int256) that is taking the most time. I also noticed that disabling the codegen slows down the query in the BEFORE case by ~10x. So the performance difference before my patch and after is huge only when codegen is enabled. Otherwise, it's like a 3x difference. (Do you think it's most likely due to the scanner being much faster in the codegen case?) -- To view, visit http://gerrit.cloudera.org:8080/7438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5776: Write partial tuple to the correct mempool
Taras Bobrovytsky has uploaded a new patch set (#6). Change subject: IMPALA-5776: Write partial tuple to the correct mempool .. IMPALA-5776: Write partial tuple to the correct mempool In the text scanner, we were writing the partial tuple variable length data to data_buffer_pool_ mempool which caused strange behavior, such as incorrect results. If we are scanning compressed data, the pool gets attached to the row batch at the end of a GetNext() call and gets freed before the next GetNext() call. This is wrong because we expect the data in the partial tuple to survive between the GetNext() calls. If we are scanning non compressed data, data_buffer_pool_ never gets cleared and grows over time until the scanner finishes reading the scan range. We fix the problem by writing the varlen partial tuple data to boundary_pool, which is where the constant length partial tuple data is written. We also make sure that boundary pool does not hold any tuple data of returned batches by always deep copying it to output batches. Testing: - Ran some tests locally on ASAN build. - Updated test_scanners_fuzz.py to make slightly more significant changes to the data files. This change was helpful for finding issues while developing this patch. Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540 --- M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h M tests/query_test/test_scanners_fuzz.py 3 files changed, 65 insertions(+), 51 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/7639/6 -- To view, visit http://gerrit.cloudera.org:8080/7639 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5776: Write partial tuple to the correct mempool
Hello Alex Behm, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7639 to look at the new patch set (#6). Change subject: IMPALA-5776: Write partial tuple to the correct mempool .. IMPALA-5776: Write partial tuple to the correct mempool In the text scanner, we were writing the partial tuple variable length data to data_buffer_pool_ mempool which caused strange behavior, such as incorrect results. If we are scanning compressed data, the pool gets attached to the row batch at the end of a GetNext() call and gets freed before the next GetNext() call. This is wrong because we expect the data in the partial tuple to survive between the GetNext() calls. If we are scanning non compressed data, data_buffer_pool_ never gets cleared and grows over time until the scanner finishes reading the scan range. We fix the problem by writing the varlen partial tuple data to boundary_pool, which is where the constant length partial tuple data is written. We also make sure that boundary pool does not hold any tuple data of returned batches by always deep copying it to output batches. Testing: - Ran some tests locally on ASAN build. - Updated test_scanners_fuzz.py to make slightly more significant changes to the data files. This change was helpful for finding issues while developing this patch. Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540 --- M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h M tests/query_test/test_scanners_fuzz.py 3 files changed, 65 insertions(+), 51 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/7639/6 -- To view, visit http://gerrit.cloudera.org:8080/7639 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] MPALA-5776: Write partial tuple to the correct mempool
Taras Bobrovytsky has posted comments on this change. Change subject: MPALA-5776: Write partial tuple to the correct mempool .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/7639/5/be/src/exec/hdfs-text-scanner.cc File be/src/exec/hdfs-text-scanner.cc: Line 297: boundary_column_.Clear(); > Can we use the existing CopyBoundaryField() function for consistency? Other I tried it and it doesn't quite work (without clearing the boundary field) This is a slightly different case. CopyBoundaryField() copies the contents of the field followed by the boundary column to memory allocated from the row batch. In this case, after FillColumns(), field_locations is pointing at the data in the boundary column. We can trick CopyBoundaryField() by clearing the boundary column right before CopyBoundaryField(), but it's such a weird thing to do. (It looks weird because we just cleared the boundary column, why are we trying to copy out of it?) Another thing we can do, is check that the field is pointing to a different memory location than the boundary column, but I also think that's weird. I suggest we leave it as is (for me, the way it is now is the least confusing way). What do you think? -- To view, visit http://gerrit.cloudera.org:8080/7639 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] MPALA-5776: Write partial tuple to the correct mempool
Taras Bobrovytsky has posted comments on this change. Change subject: MPALA-5776: Write partial tuple to the correct mempool .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/7639/5//COMMIT_MSG Commit Message: PS5, Line 7: MPALA Oops. -- To view, visit http://gerrit.cloudera.org:8080/7639 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] MPALA-5776: Write partial tuple to the correct mempool
Taras Bobrovytsky has uploaded a new patch set (#5). Change subject: MPALA-5776: Write partial tuple to the correct mempool .. MPALA-5776: Write partial tuple to the correct mempool In the text scanner, we were writing the partial tuple variable length data to data_buffer_pool_ mempool which caused strange behavior, such as incorrect results. If we are scanning compressed data, the pool gets attached to the row batch at the end of a GetNext() call and gets freed before the next GetNext() call. This is wrong because we expect the data in the partial tuple to survive between the GetNext() calls. If we are scanning non compressed data, data_buffer_pool_ never gets cleared and grows over time until the scanner finishes reading the scan range. We fix the problem by writing the varlen partial tuple data to boundary_pool_, which is where the constant length partial tuple data is written. We also make sure that boundary pool does not hold any tuple data of returned batches by always deep copying it to output batches. Testing: - Ran some tests locally on ASAN build. - Updated test_scanners_fuzz.py to make slightly more significant changes to the data files. This change was helpful for finding issues while developing this patch. Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540 --- M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h M tests/query_test/test_scanners_fuzz.py 3 files changed, 65 insertions(+), 51 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/7639/5 -- To view, visit http://gerrit.cloudera.org:8080/7639 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] MPALA-5776: Write partial tuple to the correct mempool
Hello Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7639 to look at the new patch set (#5). Change subject: MPALA-5776: Write partial tuple to the correct mempool .. MPALA-5776: Write partial tuple to the correct mempool In the text scanner, we were writing the partial tuple variable length data to data_buffer_pool_ mempool which caused strange behavior, such as incorrect results. If we are scanning compressed data, the pool gets attached to the row batch at the end of a GetNext() call and gets freed before the next GetNext() call. This is wrong because we expect the data in the partial tuple to survive between the GetNext() calls. If we are scanning non compressed data, data_buffer_pool_ never gets cleared and grows over time until the scanner finishes reading the scan range. We fix the problem by writing the varlen partial tuple data to boundary_pool_, which is where the constant length partial tuple data is written. We also make sure that boundary pool does not hold any tuple data of returned batches by always deep copying it to output batches. Testing: - Ran some tests locally on ASAN build. - Updated test_scanners_fuzz.py to make slightly more significant changes to the data files. This change was helpful for finding issues while developing this patch. Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540 --- M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h M tests/query_test/test_scanners_fuzz.py 3 files changed, 65 insertions(+), 51 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/7639/5 -- To view, visit http://gerrit.cloudera.org:8080/7639 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5776: Write partial tuple to the correct mempool
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-5776: Write partial tuple to the correct mempool .. Patch Set 4: (5 comments) http://gerrit.cloudera.org:8080/#/c/7639/2/be/src/exec/hdfs-text-scanner.cc File be/src/exec/hdfs-text-scanner.cc: Line 808: // The partial tuple pool is cleared after the data has been copied out of it to > Another idea: Add a separate function for copying the partial tuple into th Created a function for copying the partial tuple into output batch. http://gerrit.cloudera.org:8080/#/c/7639/4/be/src/exec/hdfs-text-scanner.cc File be/src/exec/hdfs-text-scanner.cc: PS4, Line 317: partial_tuple_empty_ > Comment needs update Done http://gerrit.cloudera.org:8080/#/c/7639/4/be/src/exec/hdfs-text-scanner.h File be/src/exec/hdfs-text-scanner.h: Line 194: boost::scoped_ptr boundary_pool_; > So my question is really whether there's a good reason why the lifetime nee Fixed the place where we forget to copy the data out. Removed the new memory pool that I added. Line 199: StringBuffer boundary_row_; > Can you clarify whether this data will be referenced by the output batches. Done Line 202: StringBuffer boundary_column_; > Same here. It looks like some attempt is made to copy data out of this colu Done -- To view, visit http://gerrit.cloudera.org:8080/7639 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5776: Write partial tuple to the correct mempool
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-5776: Write partial tuple to the correct mempool .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/7639/4/be/src/exec/hdfs-text-scanner.h File be/src/exec/hdfs-text-scanner.h: Line 194: boost::scoped_ptr boundary_pool_; > I still don't understand the memory lifetime for boundary_row_, boundary_co Data is not always copied out of boundary_col. It can end up here: http://github.mtv.cloudera.com/CDH/Impala/blob/e5e444a89008a22f987c517ef1ecaa9f1693b060/be/src/exec/text-converter.inline.h#L71 This happens if reuse_data is true, which it always is if codegen is enabled. So it's not safe to free the boundary pool. So the life time of the boundary pool is different than the partial tuple pool. Maybe this can be addressed in a different patch? http://gerrit.cloudera.org:8080/#/c/7639/4/tests/query_test/test_scanners_fuzz.py File tests/query_test/test_scanners_fuzz.py: Line 217: num_corruptions = rng.randint(0, int(math.log(len(data > Do these changes reproduce the bug? No, I don't think they reproduce the bug, but I think these changes are good to make in general. -- To view, visit http://gerrit.cloudera.org:8080/7639 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5776: Write partial tuple to the correct mempool
Taras Bobrovytsky has uploaded a new patch set (#4). Change subject: IMPALA-5776: Write partial tuple to the correct mempool .. IMPALA-5776: Write partial tuple to the correct mempool In the text scanner, we were writing the partial tuple variable length data to data_buffer_pool_ mempool which caused strange behavior, such as incorrect results. If we are scanning compressed data, the pool gets attached to the row batch at the end of a GetNext() call and gets freed before the next GetNext() call. This is wrong because we expect the data in the partial tuple to survive between the GetNext() calls. If we are scanning non compressed data, data_buffer_pool_ never gets cleared and grows over time until the scanner finishes reading the scan range. We fix the problem by creating a new memory pool for the partial tuple constant and variable length data. The new memory pool does not hold tuple data of returned batches because the data is always deep-copied into the output batch. Testing: - Updated test_scanners_fuzz.py to make slightly more significant changes to the data files. - Ran scanner tests locally on ASAN build. - No new tests were added, because it is difficult to construct test cases due to the issue being non-deterministic. Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540 --- M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h M tests/query_test/test_scanners_fuzz.py 3 files changed, 46 insertions(+), 40 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/7639/4 -- To view, visit http://gerrit.cloudera.org:8080/7639 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5776: Write partial tuple to the correct mempool
Taras Bobrovytsky has uploaded a new patch set (#4). Change subject: IMPALA-5776: Write partial tuple to the correct mempool .. IMPALA-5776: Write partial tuple to the correct mempool In the text scanner, we were writing the partial tuple variable length data to data_buffer_pool_ mempool which caused strange behavior, such as incorrect results. If we are scanning compressed data, the pool gets attached to the row batch at the end of a GetNext() call and gets freed before the next GetNext() call. This is wrong because we expect the data in the partial tuple to survive between the GetNext() calls. If we are scanning non compressed data, data_buffer_pool_ never gets cleared and grows over time until the scanner finishes reading the scan range. We fix the problem by creating a new memory pool for the partial tuple constant and variable length data. The new memory pool does not hold tuple data of returned batches because the data is always deep-copied into the output batch. Testing: - Updated test_scanners_fuzz.py to make slightly more significant changes to the data files. - Ran scanner tests locally on ASAN build. - No new tests were added, because it is difficult to construct test cases due to the issue being non-deterministic. Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540 --- M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h M tests/query_test/test_scanners_fuzz.py 3 files changed, 46 insertions(+), 40 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/7639/4 -- To view, visit http://gerrit.cloudera.org:8080/7639 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5776: Write partial tuple to the correct mempool
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-5776: Write partial tuple to the correct mempool .. Patch Set 3: (6 comments) http://gerrit.cloudera.org:8080/#/c/7639/3/be/src/exec/hdfs-text-scanner.cc File be/src/exec/hdfs-text-scanner.cc: Line 804: // Copy over all materialized slots in the partial tuple. > Only describes what the code does. How about something like: Done Line 808: // prevent the accumulation of variable length data in it. We also recreate the > The partial tuple isn't re-created here Done http://gerrit.cloudera.org:8080/#/c/7639/3/be/src/exec/hdfs-text-scanner.h File be/src/exec/hdfs-text-scanner.h: Line 186: /// Utility function to write out 'num_fields' to 'tuple_'. This is used to parse > Comment is weird, I suggest rephrasing to something like: Done Line 227: /// Mem pool for the partial tuple. > Mention that it does not hold tuple data of returned batches because the da Done Line 230: /// Memory to store partial tuples split across buffers. Memory comes from > Mention that this tuple is always deep copied into the output batch Done Line 231: /// partial_tuple_pool_. There is only one tuple allocated for this object and reused > The 2nd sentence does not seem to apply anymore because we realloc the part Done -- To view, visit http://gerrit.cloudera.org:8080/7639 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5776: Write partial tuple to the correct mempool
Taras Bobrovytsky has uploaded a new patch set (#4). Change subject: IMPALA-5776: Write partial tuple to the correct mempool .. IMPALA-5776: Write partial tuple to the correct mempool In the text scanner, we were writing the partial tuple variable length data to data_buffer_pool_ mempool which caused strange behavior, such as incorrect results. If we are scanning compressed data, the pool gets attached to the row batch at the end of a GetNext() call and gets freed before the next GetNext() call. This is wrong because we expect the data in the partial tuple to survive between the GetNext() calls. If we are scanning non compressed data, data_buffer_pool_ never gets cleared and grows over time until the scanner finishes reading the scan range. We fix the problem by creating a new memory pool for the partial tuple will contains partial tuple contant length and variable length data. Testing: - Updated test_scanners_fuzz.py to make slightly more significant changes to the data files. - Ran scanner tests locally on ASAN build. - No new tests were added, because it is difficult to construct test cases due to the issue being non-deterministic. Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540 --- M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h M tests/query_test/test_scanners_fuzz.py 3 files changed, 46 insertions(+), 40 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/7639/4 -- To view, visit http://gerrit.cloudera.org:8080/7639 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5776: Write partial tuple to the correct mempool
Taras Bobrovytsky has uploaded a new patch set (#3). Change subject: IMPALA-5776: Write partial tuple to the correct mempool .. IMPALA-5776: Write partial tuple to the correct mempool In the text scanner, we were writing the partial tuple variable length data to data_buffer_pool_ mempool which caused strange behavior, such as incorrect results. If we are scanning compressed data, the pool gets attached to the row batch at the end of a GetNext() call and gets freed before the next GetNext() call. This is wrong because we expect the data in the partial tuple to survive between the GetNext() calls. If we are scanning non compressed data, data_buffer_pool_ never gets cleared and grows over time until the scanner finishes reading the scan range. We fix the problem by creating a new memory pool for the partial tuple will contains partial tuple contant length and variable length data. Testing: - Updated test_scanners_fuzz.py to make slightly more significant changes to the data files. - Ran scanner tests locally on ASAN build. - No new tests were added, because it is difficult to construct test cases due to the issue being non-deterministic. Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540 --- M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h M tests/query_test/test_scanners_fuzz.py 3 files changed, 42 insertions(+), 37 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/7639/3 -- To view, visit http://gerrit.cloudera.org:8080/7639 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5776: Write partial tuple to the correct mempool
Taras Bobrovytsky has uploaded a new patch set (#3). Change subject: IMPALA-5776: Write partial tuple to the correct mempool .. IMPALA-5776: Write partial tuple to the correct mempool In the text scanner, we were writing the partial tuple variable length data to data_buffer_pool_ mempool which caused strange behavior, such as incorrect results. If we are scanning compressed data, the pool gets attached to the row batch at the end of a GetNext() call and gets freed before the next GetNext() call. This is wrong because we expect the data in the partial tuple to survive between the GetNext() calls. If we are scanning non compressed data, data_buffer_pool_ never gets cleared and grows over time until the scanner finishes reading the scan range. We fix the problem by creating a new memory pool for the partial tuple will contains partial tuple contant length and variable length data. Testing: - Updated test_scanners_fuzz.py to make slightly more significant changes to the data files. - Ran scanner tests locally on ASAN build. - No new tests were added, because it is difficult to construct test cases due to the issue being non-deterministic. Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540 --- M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h M tests/query_test/test_scanners_fuzz.py 3 files changed, 42 insertions(+), 37 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/7639/3 -- To view, visit http://gerrit.cloudera.org:8080/7639 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5776: Write partial tuple to the correct mempool
Taras Bobrovytsky has uploaded a new patch set (#3). Change subject: IMPALA-5776: Write partial tuple to the correct mempool .. IMPALA-5776: Write partial tuple to the correct mempool In the text scanner, we were writing the partial tuple variable length data to data_buffer_pool_ mempool which caused strange behavior, such as incorrect results. If we are scanning compressed data, the pool gets attached to the row batch at the end of a GetNext() call and gets freed before the next GetNext() call. This is wrong because we expect the data in the partial tuple to survive between the GetNext() calls. If we are scanning non compressed data, data_buffer_pool_ never gets cleared and grows over time until the scanner finishes reading the scan range. We fix the problem by creating a new memory pool for the partial tuple will contains partial tuple contant length and variable length data. Testing: - Updated test_scanners_fuzz.py to make slightly more significant changes to the data files. - Ran scanner tests locally on ASAN build. - No new tests were added, because it is difficult to construct test cases due to the issue being non-deterministic. Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540 --- M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h M tests/query_test/test_scanners_fuzz.py 3 files changed, 43 insertions(+), 37 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/7639/3 -- To view, visit http://gerrit.cloudera.org:8080/7639 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] MPALA-5776: Write partial tuple to the correct mempool
Taras Bobrovytsky has posted comments on this change. Change subject: MPALA-5776: Write partial tuple to the correct mempool .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/7639/2/be/src/exec/hdfs-text-scanner.cc File be/src/exec/hdfs-text-scanner.cc: Line 169: row_batch->tuple_data_pool()->AcquireData(boundary_pool_.get(), false); > I don't think that data from the boundary pool is ever used outside of this Actually, after running some tests with ASAN, it turns out that it's not ok to clear the boundary pool. The boundary col contains some var len data that is used after Close() is called. Also, what do you mean we should be transferring the contents of boundary pool below instead of clearing it? Do you mean in the else clause? If row batch is null, where would would we transfer the contents to? -- To view, visit http://gerrit.cloudera.org:8080/7639 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] MPALA-5776: Write partial tuple to the correct mempool
Taras Bobrovytsky has posted comments on this change. Change subject: MPALA-5776: Write partial tuple to the correct mempool .. Patch Set 2: (12 comments) http://gerrit.cloudera.org:8080/#/c/7639/2//COMMIT_MSG Commit Message: Line 7: MPALA-5776: Write partial tuple to the correct mempool > Missing I. Done http://gerrit.cloudera.org:8080/#/c/7639/2/be/src/exec/hdfs-text-scanner.cc File be/src/exec/hdfs-text-scanner.cc: Line 169: row_batch->tuple_data_pool()->AcquireData(boundary_pool_.get(), false); > Does this even make sense? are there any cases where it's valid for the row I don't think that data from the boundary pool is ever used outside of this class, so it makes sense to clear it. I analyzed the code, and it doesn't look to me like data in boundary pool is ever used, but how can we be 100% sure? Line 269: bool eosr = true; > Remove. This not needed/used. It's used on line 277. Are you suggesting to remove it from the signature of FillByteBuffer as well? Line 299: boundary_column_.Clear(); > What's the point of clearing the boundary column here? I don't think the sc Removed this. PS2, Line 765: batches > I'm not sure that I understand the reference to batches. Do they mean block Yeah, "batches" doesn't make sense here. Changed it to blocks. Line 806: partial_tuple_->DeepCopy(tuple_, *scan_node_->tuple_desc(), pool); > Can you add a column clarifying what the intent is. E.g. "Copy over all mat Done Line 808: boundary_row_.Reset(); > Another idea: Add a separate function for copying the partial tuple into th I added a new mem pool, so boundary row and columns are not affected any more. Alex, do you guys think it's still worth creating a separate function? PS2, Line 810: emptied > cleared? The comment makes me thing that all the memory has been freed, but Done Line 814: partial_tuple_ = Tuple::Create(tuple_byte_size_, boundary_pool_.get()); > Is it possible to just initialize partial_tuple_ when it's needed? I.e. bef Done Line 896: int num_fields, bool copy_strings) { > Can you remove the 'copy_strings' parameter? It's not used in this function Done http://gerrit.cloudera.org:8080/#/c/7639/2/be/src/exec/hdfs-text-scanner.h File be/src/exec/hdfs-text-scanner.h: Line 188: /// the boundary pool. > Hah! If only the code had matched the comment... totally agree Line 194: /// Mem pool for boundary_row_ and boundary_column_. > Needs updating since it also backs partial_tuple_. I made changes in the next patch so that this comment is now true. -- To view, visit http://gerrit.cloudera.org:8080/7639 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5776: Write partial tuple to the correct mempool
Taras Bobrovytsky has uploaded a new patch set (#3). Change subject: IMPALA-5776: Write partial tuple to the correct mempool .. IMPALA-5776: Write partial tuple to the correct mempool In the text scanner, we were writing the partial tuple variable length data to data_buffer_pool_ mempool which caused strange behavior, such as incorrect results. If we are scanning compressed data, the pool gets attached to the row batch at the end of a GetNext() call and gets freed before the next GetNext() call. This is wrong because we expect the data in the partial tuple to survive between the GetNext() calls. If we are scanning non compressed data, data_buffer_pool_ never gets cleared and grows over time until the scanner finishes reading the scan range. We fix the problem by creating a new memory pool for the partial tuple will contains partial tuple contant length and variable length data. Testing: - Ran some tests locally on ASAN build. - No new tests were added, because it is difficult to construct test cases due to the issue being non-deterministic. Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540 --- M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h 2 files changed, 29 insertions(+), 27 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/7639/3 -- To view, visit http://gerrit.cloudera.org:8080/7639 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5776: Write partial tuple to the correct mempool
Taras Bobrovytsky has uploaded a new patch set (#3). Change subject: IMPALA-5776: Write partial tuple to the correct mempool .. IMPALA-5776: Write partial tuple to the correct mempool In the text scanner, we were writing the partial tuple variable length data to data_buffer_pool_ mempool which caused strange behavior, such as incorrect results. If we are scanning compressed data, the pool gets attached to the row batch at the end of a GetNext() call and gets freed before the next GetNext() call. This is wrong because we expect the data in the partial tuple to survive between the GetNext() calls. If we are scanning non compressed data, data_buffer_pool_ never gets cleared and grows over time until the scanner finishes reading the scan range. We fix the problem by creating a new memory pool for the partial tuple will contains partial tuple contant length and variable length data. Testing: - Ran some tests locally on ASAN build. - No new tests were added, because it is difficult to construct test cases due to the issue being non-deterministic. Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540 --- M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h 2 files changed, 29 insertions(+), 27 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/7639/3 -- To view, visit http://gerrit.cloudera.org:8080/7639 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-2422: Fix escaping in LIKE clause
Taras Bobrovytsky has uploaded a new patch set (#2). Change subject: IMPALA-2422: Fix escaping in LIKE clause .. IMPALA-2422: Fix escaping in LIKE clause There are two stages to processing a like clause. First, we determine if it is possible to convert the expression to a simpler function, such as StartsWith() or EndsWith(). If not, then we use a Regex libarary to compute the result. There was a problem in the logic that determines if it is possible to use a simpler function. It did not take into account escape characters. For example, "abc\%" was incorrectly converted to StartsWith("abc\"). Testing: -Added expr tests. Change-Id: Iec0f5d99c855ea8d4fe1bcf28c05780ba315034b --- M be/src/exprs/expr-test.cc M be/src/exprs/like-predicate.cc 2 files changed, 28 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/7660/2 -- To view, visit http://gerrit.cloudera.org:8080/7660 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iec0f5d99c855ea8d4fe1bcf28c05780ba315034b Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm
[Impala-ASF-CR] IMPALA-2422: Fix escaping in LIKE clause
Taras Bobrovytsky has uploaded a new patch set (#2). Change subject: IMPALA-2422: Fix escaping in LIKE clause .. IMPALA-2422: Fix escaping in LIKE clause There are two stages to processing a like clause. First, we determine if it is possible to convert the expression to a simpler function, such as StartsWith() or EndsWith(). If not, then we use a Regex libarary to compute the result. There was a problem in the logic that determines if it is possible to use a simpler function. It did not take into account escape characters. For example, "abc\%" was incorrectly converted to StartsWith("abc\"). Testing: -Added expr tests. Change-Id: Iec0f5d99c855ea8d4fe1bcf28c05780ba315034b --- M be/src/exprs/expr-test.cc M be/src/exprs/like-predicate.cc 2 files changed, 28 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/7660/2 -- To view, visit http://gerrit.cloudera.org:8080/7660 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iec0f5d99c855ea8d4fe1bcf28c05780ba315034b Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm
[Impala-ASF-CR] IMPALA-5681: release reservation from blocking operators
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-5681: release reservation from blocking operators .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/7619/4/be/src/exec/partitioned-aggregation-node.cc File be/src/exec/partitioned-aggregation-node.cc: PS4, Line 1203: put a comma here -- To view, visit http://gerrit.cloudera.org:8080/7619 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6f4d0ad127d5fcd14b9821a7c127eec11d98692f Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2422: Fix escaping in LIKE clause
Taras Bobrovytsky has uploaded a new change for review. http://gerrit.cloudera.org:8080/7660 Change subject: IMPALA-2422: Fix escaping in LIKE clause .. IMPALA-2422: Fix escaping in LIKE clause There are two stages to processing a like clause. First, we determine if it is possible to convert the expression to a simpler function, such as StartsWith() or EndsWith(). If not, then we use a Regex libarary to compute the result. There was a problem in the logic that determines if it is possible to use a simpler function. It did not take into account escape characters. For example, "abc\%" was incorrectly converted to StartsWith("abc\"). Testing: -Added expr tests. Change-Id: Iec0f5d99c855ea8d4fe1bcf28c05780ba315034b --- M be/src/exprs/expr-test.cc M be/src/exprs/like-predicate.cc 2 files changed, 28 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/7660/1 -- To view, visit http://gerrit.cloudera.org:8080/7660 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Iec0f5d99c855ea8d4fe1bcf28c05780ba315034b Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky
[Impala-ASF-CR] MPALA-5776: Write partial tuple to the correct mempool
Taras Bobrovytsky has uploaded a new patch set (#2). Change subject: MPALA-5776: Write partial tuple to the correct mempool .. MPALA-5776: Write partial tuple to the correct mempool In the text scanner, we were writing the partial tuple variable length data to data_buffer_pool_ mempool which caused strange behavior, such as incorrect results. If we are scanning compressed data, the pool gets attached to the row batch at the end of a GetNext() call and gets freed before the next GetNext() call. This is wrong because we expect the data in the partial tuple to survive between the GetNext() calls. If we are scanning non compressed data, data_buffer_pool_ never gets cleared and grows over time until the scanner finishes reading the scan range. We fix the problem by writing the varlen partial tuple data to boundary_pool_, which is where the constant length partial tuple data is written. Testing: - Ran some tests locally on ASAN build. - No new tests were added, because it is difficult to construct test cases due to the issue being non-deterministic. Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540 --- M be/src/exec/hdfs-text-scanner.cc 1 file changed, 13 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/7639/2 -- To view, visit http://gerrit.cloudera.org:8080/7639 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] MPALA-5776: Write partial tuple to the correct mempool
Taras Bobrovytsky has uploaded a new patch set (#2). Change subject: MPALA-5776: Write partial tuple to the correct mempool .. MPALA-5776: Write partial tuple to the correct mempool In the text scanner, we were writing the partial tuple variable length data to data_buffer_pool_ mempool which caused strange behavior, such as incorrect results. If we are scanning compressed data, the pool gets attached to the row batch at the end of a GetNext() call and gets freed before the next GetNext() call. This is wrong because we expect the data in the partial tuple to survive between the GetNext() calls. If we are scanning non compressed data, data_buffer_pool_ never gets cleared and grows over time until the scanner finishes reading the scan range. We fix the problem by writing the varlen partial tuple data to boundary_pool_, which is where the constant length partial tuple data is written. Testing: - Ran some tests locally on ASAN build. - No new tests were added, because it is difficult to construct test cases due to the issue being non-deterministic. Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540 --- M be/src/exec/hdfs-text-scanner.cc 1 file changed, 13 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/7639/2 -- To view, visit http://gerrit.cloudera.org:8080/7639 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] MPALA-5776: Write partial tuple to the correct mempool
Taras Bobrovytsky has uploaded a new patch set (#2). Change subject: MPALA-5776: Write partial tuple to the correct mempool .. MPALA-5776: Write partial tuple to the correct mempool In the text scanner, we were writing the partial tuple variable length data to data_buffer_pool_ mempool which caused strange behavior, such as incorrect results. If we are scanning compressed data, the pool gets attached to the row batch at the end of a GetNext() call and gets freed before the next GetNext() call. This is wrong because we expect the data in the partial tuple to survive between the GetNext() calls. If we are scanning non compressed data, data_buffer_pool_ never gets cleared and grows over time until the scanner finishes reading the scan range. We fix the problem by writing the varlen partial tuple data to boundary_pool_, which is where the constant length partial tuple data is written. Testing: - Ran some tests locally on ASAN build. - No new tests were added, because it is difficult to construct test cases due to the issue being non-deterministic. Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540 --- M be/src/exec/hdfs-text-scanner.cc 1 file changed, 13 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/7639/2 -- To view, visit http://gerrit.cloudera.org:8080/7639 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] MPALA-5776: Write partial tuple to the correct mempool
Taras Bobrovytsky has uploaded a new patch set (#2). Change subject: MPALA-5776: Write partial tuple to the correct mempool .. MPALA-5776: Write partial tuple to the correct mempool In the text scanner, we were writing the partial tuple variable length data to data_buffer_pool_ mempool which caused strange behavior, such as incorrect results. If we are scanning compressed data, the pool gets attached to the row batch at the end of a GetNext() call and gets freed before the next GetNext() call. This is wrong because we expect the data in the partial tuple to survive between the GetNext() calls. If we are scanning non compressed data, data_buffer_pool_ never gets cleared and grows over time until the scanner finishes reading the scan range. We fix the problem by writing the varlen partial tuple data to boundary_pool_, which is where the constant length partial tuple data is written. Testing: - Ran some tests locally on ASAN build. - No new tests were added, because it is difficult to construct test cases due to the issue being non-deterministic. Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540 --- M be/src/exec/hdfs-text-scanner.cc 1 file changed, 14 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/7639/2 -- To view, visit http://gerrit.cloudera.org:8080/7639 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] MPALA-5776: Write partial tuple to the correct mempool
Taras Bobrovytsky has uploaded a new change for review. http://gerrit.cloudera.org:8080/7639 Change subject: MPALA-5776: Write partial tuple to the correct mempool .. MPALA-5776: Write partial tuple to the correct mempool In the text scanner, we were writing the partial tuple variable length data to data_buffer_pool_ mempool which caused strange behavior, such as incorrect results. If we are scanning compressed data, the pool gets attached to the row batch at the end of a GetNext() call and gets freed before the next GetNext() call. This is wrong because we expect the data in the partial tuple to survive between the GetNext() calls. If we are scanning non compressed data, data_buffer_pool_ never gets cleared and grows over time until the scanner finishes reading the scan range. We fix the problem by writing the varlen partial tuple data to boundary_pool_, which is where the constant length partial tuple data is written. Testing: - Ran some tests locally on ASAN build. - No new tests were added, because it is difficult to construct test cases due to the issue being non-deterministic. Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540 --- M be/src/exec/hdfs-text-scanner.cc 1 file changed, 11 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/7639/1 -- To view, visit http://gerrit.cloudera.org:8080/7639 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-4939, IMPALA-4939: Decimal V2 multiplication
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-4939, IMPALA-4939: Decimal V2 multiplication .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/7438/4//COMMIT_MSG Commit Message: Line 75: DECIMAL_V2 disabled: 2.65s > This is scary and I don't think we should consider merging this until it's It takes time to convert a number from 128 bits to 256 bits. Also, Boost does something like 4 128 bit multiplications internally in the 256 bit multiplication function (so it makes sense for it to be at least 4x slower). I think the 256 bit multiplication function is also not optimized for the worst case (where the result actually needs more than 128 bits). Also, it takes time to convert 2 numbers to 256 bit numbers and then convert 1 number back. I don't think it's that surprising that it's much slower. 256 bit multiplication is slow and is not implemented in hardware like others. Remember, this benchmark is the absolute worst case scenario, where we do a 256 bit multiplication for every row. -- To view, visit http://gerrit.cloudera.org:8080/7438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5722: Fix string to decimal cast
Taras Bobrovytsky has uploaded a new patch set (#2). Change subject: IMPALA-5722: Fix string to decimal cast .. IMPALA-5722: Fix string to decimal cast When converting a string to a decimal, we didn't handle the case where the exponent is a large negative number. The function for computing the divisor returns -1 when the exponent is too large. The problem is fixed by making sure that the divisor is positive. Testing: -Added decimal tests. Change-Id: I1f5eb5b64a6924af2e8eb7f9a50da67015757efe --- M be/src/runtime/decimal-test.cc M be/src/util/decimal-util.h M be/src/util/string-parser.h 3 files changed, 31 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/7517/2 -- To view, visit http://gerrit.cloudera.org:8080/7517 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1f5eb5b64a6924af2e8eb7f9a50da67015757efe Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Jim Apple Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke
[Impala-ASF-CR] IMPALA-4939, IMPALA-4939: Decimal V2 multiplication
Taras Bobrovytsky has uploaded a new patch set (#4). Change subject: IMPALA-4939, IMPALA-4939: Decimal V2 multiplication .. IMPALA-4939, IMPALA-4939: Decimal V2 multiplication Implement the new DECIMAL return type rules for multiply expressions, active when query option DECIMAL_V2=1. The algorithm for determining the type of the result of multiplication is described in the JIRA. DECIMAL V1: +---+ | typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) | +---+ | DECIMAL(38,38)| +---+ +---+ | typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) | +---+ | DECIMAL(38,30)| +---+ DECIMAL V2: +---+ | typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) | +---+ | DECIMAL(38,37)| +---+ +---+ | typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) | +---+ | DECIMAL(38,6) | +---+ In this patch, we also fix the early multiplication overflow. We compute an int256 intermediate value, which we then attempt to scale down and round to int128. Benchmarks: Query: select sum(l_quantity * l_tax) + sum(l_extendedprice * l_discount) from lineitem_big; Before: DECIMAL_V2 disabled: 6.68s DECIMAL_V2 enabled : 6.77s After: DECIMAL_V2 disabled: 6.66s DECIMAL_V2 enabled : 6.58s In the above benchmark, we are selecting from lineitem_big, which has about 100 times as many rows as the normal lineitem. Query: select sum(l_quantity * l_tax * cast(1 as decimal(38, 37))) + sum(l_extendedprice * l_discount * cast(1 as decimal(38, 37))) from lineitem Before: DECIMAL_V2 disabled: 0.23s DECIMAL_V2 enabled : 0.45s After: DECIMAL_V2 disabled: 2.65s DECIMAL_V2 enabled : 2.54s The query is slower after the patch because the intermediate result is int256. Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 --- M be/src/exprs/expr-test.cc M be/src/runtime/decimal-value.inline.h M be/src/util/bit-util.h M fe/src/main/java/org/apache/impala/analysis/TypesUtil.java 4 files changed, 253 insertions(+), 58 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/7438/4 -- To view, visit http://gerrit.cloudera.org:8080/7438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-4939, IMPALA-4939: Decimal V2 multiplication
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-4939, IMPALA-4939: Decimal V2 multiplication .. Patch Set 3: (4 comments) http://gerrit.cloudera.org:8080/#/c/7438/1/be/src/runtime/decimal-value.inline.h File be/src/runtime/decimal-value.inline.h: Line 301: // doubling such a value would overflow in two's complement. However, we > A while loop will perform terribly, but there are intrinsics that can be us Added a TODO. http://gerrit.cloudera.org:8080/#/c/7438/1/be/src/util/bit-util.h File be/src/util/bit-util.h: Line 65: return value < 0 ? -1 : 1; > I think we want to think carefully about using the Sign function with int25 I included the benchmark result in the commit message. http://gerrit.cloudera.org:8080/#/c/7438/1/be/src/util/decimal-util.h File be/src/util/decimal-util.h: Line 336 > Really? That's crazy. I removed this function so that the function on line 58 would get called. http://gerrit.cloudera.org:8080/#/c/7438/1/fe/src/main/java/org/apache/impala/analysis/TypesUtil.java File fe/src/main/java/org/apache/impala/analysis/TypesUtil.java: Line 260: resultPrecision = p1 + p2 + 1; > I think the 0.999 * 0.999 case is a fluke - we are sacrificing precision fo This is more of a product question. I'll ask Greg about this later. -- To view, visit http://gerrit.cloudera.org:8080/7438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4939, IMPALA-4939: Decimal V2 multiplication
Taras Bobrovytsky has uploaded a new patch set (#4). Change subject: IMPALA-4939, IMPALA-4939: Decimal V2 multiplication .. IMPALA-4939, IMPALA-4939: Decimal V2 multiplication Implement the new DECIMAL return type rules for multiply expressions, active when query option DECIMAL_V2=1. The algorithm for determining the type of the result of multiplication is described in the JIRA. DECIMAL V1: +---+ | typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) | +---+ | DECIMAL(38,38)| +---+ +---+ | typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) | +---+ | DECIMAL(38,30)| +---+ DECIMAL V2: +---+ | typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) | +---+ | DECIMAL(38,37)| +---+ +---+ | typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) | +---+ | DECIMAL(38,6) | +---+ In this patch, we also fix the early multiplication overflow. We compute an int256 intermediate value, which we then attempt to scale down and round to int128. Benchmarks: Query: select sum(l_quantity * l_tax) + sum(l_extendedprice * l_discount) from lineitem_big; Before: DECIMAL_V2 disabled: 6.68s DECIMAL_V2 enabled : 6.77s After: DECIMAL_V2 disabled: 6.66s DECIMAL_V2 enabled : 6.58s In the above benchmark, we are selecting from lineitem_big, which has about 100 times as many rows as the normal lineitem. Query: select sum(l_quantity * l_tax * cast(1 as decimal(38, 37))) + sum(l_extendedprice * l_discount * cast(1 as decimal(38, 37))) from lineitem Before: DECIMAL_V2 disabled: 0.23s DECIMAL_V2 enabled : 0.45s After: DECIMAL_V2 disabled: 2.65s DECIMAL_V2 enabled : 2.54s The query is slower after the patch because the intermediate result is int256. Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 --- M be/src/exprs/expr-test.cc M be/src/runtime/decimal-value.inline.h M be/src/util/bit-util.h M fe/src/main/java/org/apache/impala/analysis/TypesUtil.java 4 files changed, 253 insertions(+), 58 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/7438/4 -- To view, visit http://gerrit.cloudera.org:8080/7438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-5722: Fix string to decimal cast
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-5722: Fix string to decimal cast .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/7517/1/be/src/runtime/decimal-test.cc File be/src/runtime/decimal-test.cc: Line 305: // When the absolute value of the exponent becomes too large, it is considered > This seems a bit weird - it seems like it should be underflow. It seems lik Changed the be code so that a large negative number would lead to an underflow (instead of overflow). http://gerrit.cloudera.org:8080/#/c/7517/1/be/src/util/decimal-util.h File be/src/util/decimal-util.h: Line 63: DCHECK_GE(result * 10, result); > The generic integer overflow support wasn't added until gcc5 I believe. Agr Added a TODO. http://gerrit.cloudera.org:8080/#/c/7517/1/be/src/util/string-parser.h File be/src/util/string-parser.h: PS1, Line 225: DCHECK > DCHECK_EQ DCHECK_EQ doesn't work with int128. Added a comment. -- To view, visit http://gerrit.cloudera.org:8080/7517 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1f5eb5b64a6924af2e8eb7f9a50da67015757efe Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Jim Apple Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5722: Fix string to decimal cast
Taras Bobrovytsky has uploaded a new patch set (#2). Change subject: IMPALA-5722: Fix string to decimal cast .. IMPALA-5722: Fix string to decimal cast When converting a string to a decimal, we didn't handle the case where the exponent is a large negative number. The function for computing the divisor returns -1 when the exponent is too large. The problem is fixed by making sure that the divisor is positive. Testing: -Added decimal tests. Change-Id: I1f5eb5b64a6924af2e8eb7f9a50da67015757efe --- M be/src/runtime/decimal-test.cc M be/src/util/decimal-util.h M be/src/util/string-parser.h 3 files changed, 31 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/7517/2 -- To view, visit http://gerrit.cloudera.org:8080/7517 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1f5eb5b64a6924af2e8eb7f9a50da67015757efe Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Jim Apple Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke
[Impala-ASF-CR] IMPALA-5636: changed the format metadata of repetition level
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-5636: changed the format metadata of repetition level .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/7514/2//COMMIT_MSG Commit Message: Line 7: IMPALA-5636: changed the format metadata of repetition level from bit_packing to RLE > Try to keep the subject line short (if possible, something like 50 characte It looks like patch #3 is a draft (it should be published). Patch #3 is only visible to those who commented on the patch. -- To view, visit http://gerrit.cloudera.org:8080/7514 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4112ec88e8f4050d28661d27f9227450288a6756 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tianyi WangGerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5636: changed the format metadata of repetition level
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-5636: changed the format metadata of repetition level .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/7514/2//COMMIT_MSG Commit Message: Line 7: IMPALA-5636: changed the format metadata of repetition level from bit_packing to RLE Try to keep the subject line short (if possible, something like 50 characters) Also, use the imperative mood in the subject line. i.e. replace the word "changed" with "Change". Also, capitalize the first letter. Give a brief description of the patch and why it's needed in the body of the commit message. -- To view, visit http://gerrit.cloudera.org:8080/7514 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4112ec88e8f4050d28661d27f9227450288a6756 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tianyi WangGerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5689: Avoid inverting non-equi left joins
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-5689: Avoid inverting non-equi left joins .. Patch Set 4: Code-Review+2 Rebased and forwarding the +2 -- To view, visit http://gerrit.cloudera.org:8080/7476 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I91ba66fe30139fcd44d4615a142f183266800aab Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5689: Avoid inverting non-equi left joins
Taras Bobrovytsky has uploaded a new patch set (#4). Change subject: IMPALA-5689: Avoid inverting non-equi left joins .. IMPALA-5689: Avoid inverting non-equi left joins When checking if a join can be inverted, we forgot to also check that the resulting join would not be a non-equi right semi-join or a non-equi right outer-join. We currently do not support those kinds of joins in the backend. Testing: -Added a planner test Change-Id: I91ba66fe30139fcd44d4615a142f183266800aab --- M fe/src/main/java/org/apache/impala/planner/JoinNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M testdata/workloads/functional-planner/queries/PlannerTest/nested-loop-join.test 3 files changed, 80 insertions(+), 11 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/76/7476/4 -- To view, visit http://gerrit.cloudera.org:8080/7476 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I91ba66fe30139fcd44d4615a142f183266800aab Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5689: Avoid inverting non-equi left joins
Hello Bharath Vissapragada, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7476 to look at the new patch set (#4). Change subject: IMPALA-5689: Avoid inverting non-equi left joins .. IMPALA-5689: Avoid inverting non-equi left joins When checking if a join can be inverted, we forgot to also check that the resulting join would not be a non-equi right semi-join or a non-equi right outer-join. We currently do not support those kinds of joins in the backend. Testing: -Added a planner test Change-Id: I91ba66fe30139fcd44d4615a142f183266800aab --- M fe/src/main/java/org/apache/impala/planner/JoinNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M testdata/workloads/functional-planner/queries/PlannerTest/nested-loop-join.test 3 files changed, 80 insertions(+), 11 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/76/7476/4 -- To view, visit http://gerrit.cloudera.org:8080/7476 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I91ba66fe30139fcd44d4615a142f183266800aab Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5689: Avoid inverting non-equi right joins
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-5689: Avoid inverting non-equi right joins .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/7476/3//COMMIT_MSG Commit Message: PS3, Line 7: right joins > left joins? Done PS3, Line 10: resulting join would not be a non-equi left semi or non-equi left : outer join > since you are talking about the resulting join, you should mean " the resul Done -- To view, visit http://gerrit.cloudera.org:8080/7476 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I91ba66fe30139fcd44d4615a142f183266800aab Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5722: Fix string to decimal cast
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-5722: Fix string to decimal cast .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/7517/1/be/src/util/string-parser.h File be/src/util/string-parser.h: Line 221: T divisor = DecimalUtil::GetScaleMultiplier(shift); > I think it can be a huge negative number once it overflows. Look at decimal-util.h line 174, where -1 is returned. The function you were looking at, Tim, is called only in the case of int256. -- To view, visit http://gerrit.cloudera.org:8080/7517 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1f5eb5b64a6924af2e8eb7f9a50da67015757efe Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Jim Apple Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5722: Fix string to decimal cast
Taras Bobrovytsky has uploaded a new change for review. http://gerrit.cloudera.org:8080/7517 Change subject: IMPALA-5722: Fix string to decimal cast .. IMPALA-5722: Fix string to decimal cast When converting a string to a decimal, we didn't handle the case where the exponent is a large negative number. The function for computing the divisor returns -1 when the exponent is too large. The problem is fixed by making sure that the divisor is positive. Testing: -Added decimal tests. Change-Id: I1f5eb5b64a6924af2e8eb7f9a50da67015757efe --- M be/src/runtime/decimal-test.cc M be/src/util/decimal-util.h M be/src/util/string-parser.h 3 files changed, 23 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/7517/1 -- To view, visit http://gerrit.cloudera.org:8080/7517 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I1f5eb5b64a6924af2e8eb7f9a50da67015757efe Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-4939, IMPALA-4939: Decimal V2 multiplication
Taras Bobrovytsky has uploaded a new patch set (#3). Change subject: IMPALA-4939, IMPALA-4939: Decimal V2 multiplication .. IMPALA-4939, IMPALA-4939: Decimal V2 multiplication Implement the new DECIMAL return type rules for multiply expressions, active when query option DECIMAL_V2=1. The algorithm for determining the type of the result of multiplication is described in the JIRA. DECIMAL V1: +---+ | typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) | +---+ | DECIMAL(38,38)| +---+ +---+ | typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) | +---+ | DECIMAL(38,30)| +---+ DECIMAL V2: +---+ | typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) | +---+ | DECIMAL(38,37)| +---+ +---+ | typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) | +---+ | DECIMAL(38,6) | +---+ In this patch, we also fix the early multiplication overflow. We compute an int256 intermediate value, which we then attempt to scale down and round to int128. Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 --- M be/src/exprs/expr-test.cc M be/src/runtime/decimal-value.inline.h M be/src/util/bit-util.h M fe/src/main/java/org/apache/impala/analysis/TypesUtil.java 4 files changed, 237 insertions(+), 57 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/7438/3 -- To view, visit http://gerrit.cloudera.org:8080/7438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-4939, IMPALA-4939: Decimal V2 multiplication
Taras Bobrovytsky has uploaded a new patch set (#3). Change subject: IMPALA-4939, IMPALA-4939: Decimal V2 multiplication .. IMPALA-4939, IMPALA-4939: Decimal V2 multiplication Implement the new DECIMAL return type rules for multiply expressions, active when query option DECIMAL_V2=1. The algorithm for determining the type of the result of multiplication is described in the JIRA. DECIMAL V1: +---+ | typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) | +---+ | DECIMAL(38,38)| +---+ +---+ | typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) | +---+ | DECIMAL(38,30)| +---+ DECIMAL V2: +---+ | typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) | +---+ | DECIMAL(38,37)| +---+ +---+ | typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) | +---+ | DECIMAL(38,6) | +---+ In this patch, we also fix the early multiplication overflow. We compute an int256 intermediate value, which we then attempt to scale down and round to int128. Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 --- M be/src/exprs/expr-test.cc M be/src/runtime/decimal-value.inline.h M be/src/util/bit-util.h M fe/src/main/java/org/apache/impala/analysis/TypesUtil.java 4 files changed, 237 insertions(+), 57 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/7438/3 -- To view, visit http://gerrit.cloudera.org:8080/7438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-4939, IMPALA-4939: Decimal V2 multiplication
Taras Bobrovytsky has uploaded a new patch set (#3). Change subject: IMPALA-4939, IMPALA-4939: Decimal V2 multiplication .. IMPALA-4939, IMPALA-4939: Decimal V2 multiplication Implement the new DECIMAL return type rules for multiply expressions, active when query option DECIMAL_V2=1. The algorithm for determining the type of the result of multiplication is described in the JIRA. DECIMAL V1: +---+ | typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) | +---+ | DECIMAL(38,38)| +---+ +---+ | typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) | +---+ | DECIMAL(38,30)| +---+ DECIMAL V2: +---+ | typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) | +---+ | DECIMAL(38,37)| +---+ +---+ | typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) | +---+ | DECIMAL(38,6) | +---+ In this patch, we also fix the early multiplication overflow. We compute an int256 intermediate value, which we then attempt to scale down and round to int128. Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 --- M be/src/exprs/expr-test.cc M be/src/runtime/decimal-value.inline.h M be/src/util/bit-util.h M fe/src/main/java/org/apache/impala/analysis/TypesUtil.java 4 files changed, 237 insertions(+), 57 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/7438/3 -- To view, visit http://gerrit.cloudera.org:8080/7438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-5679: Fix Parquet count(*) with group by string
Taras Bobrovytsky has uploaded a new change for review. http://gerrit.cloudera.org:8080/7481 Change subject: IMPALA-5679: Fix Parquet count(*) with group by string .. IMPALA-5679: Fix Parquet count(*) with group by string In a recent patch (IMPALA-5036) a bug was introduced where a count(*) query with a group by a string partition column returned incorrect results. Data was being written into the tuple at an incorrect offset. Testing: - Added an end to end test where we are selecting from a table partitioned by string. Change-Id: I225547574c2b2259ca81cb642d082e151f3bed6b --- M be/src/exec/hdfs-scan-node-base.h M testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test M tests/query_test/test_aggregation.py 3 files changed, 26 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/81/7481/1 -- To view, visit http://gerrit.cloudera.org:8080/7481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I225547574c2b2259ca81cb642d082e151f3bed6b Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky