[Impala-ASF-CR] IMPALA-3401 [DOCS] Phase 5 of "Cloudera Manager" removal. Most of these fixes entailed hiding the paragraph where there are mentions of CM and writing a replacement paragraph for the u
Laurel Hale has uploaded a new change for review. http://gerrit.cloudera.org:8080/6070 Change subject: IMPALA-3401 [DOCS] Phase 5 of "Cloudera Manager" removal. Most of these fixes entailed hiding the paragraph where there are mentions of CM and writing a replacement paragraph for the upstream docs. This removes the CM references from the rendered docs. A .. IMPALA-3401 [DOCS] Phase 5 of "Cloudera Manager" removal. Most of these fixes entailed hiding the paragraph where there are mentions of CM and writing a replacement paragraph for the upstream docs. This removes the CM references from the rendered docs. A subsequent cleanup project will remove occurrences of CM from the XML. Change-Id: I4967fae275a8822274aece14a15107237445aba5 --- M docs/topics/impala_isilon.xml M docs/topics/impala_logging.xml M docs/topics/impala_s3.xml M docs/topics/impala_webui.xml 4 files changed, 78 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/6070/1 -- To view, visit http://gerrit.cloudera.org:8080/6070 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I4967fae275a8822274aece14a15107237445aba5 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Laurel Hale
[Impala-ASF-CR] IMPALA-3401 [DOCS] Part 4 of "Cloudera Manager" removal. Most of these fixes involved hiding the paragraphs with the DITA attribute 'audience="hidden"' and then inserting a paragraph s
Laurel Hale has uploaded a new change for review. http://gerrit.cloudera.org:8080/6069 Change subject: IMPALA-3401 [DOCS] Part 4 of "Cloudera Manager" removal. Most of these fixes involved hiding the paragraphs with the DITA attribute 'audience="hidden"' and then inserting a paragraph suitable for upstream documentation. This hides the mention of Cloudera .. IMPALA-3401 [DOCS] Part 4 of "Cloudera Manager" removal. Most of these fixes involved hiding the paragraphs with the DITA attribute 'audience="hidden"' and then inserting a paragraph suitable for upstream documentation. This hides the mention of Cloudera Manager in the rendered documentation. In a subsequent cleanup project, the "Cloudera Manager" mentions will be removed from the XML. Change-Id: I3c3c2177e0b9c4c81f1541820013c66a59c0c7b1 --- M docs/shared/impala_common.xml M docs/topics/impala_perf_resources.xml M docs/topics/impala_perf_skew.xml M docs/topics/impala_perf_testing.xml M docs/topics/impala_scalability.xml M docs/topics/impala_txtfile.xml 6 files changed, 49 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/6069/1 -- To view, visit http://gerrit.cloudera.org:8080/6069 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I3c3c2177e0b9c4c81f1541820013c66a59c0c7b1 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Laurel Hale
[Impala-ASF-CR] IMPALA-4546: Fix Moscow timezone conversion after 2014
Alex Behm has posted comments on this change. Change subject: IMPALA-4546: Fix Moscow timezone conversion after 2014 .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/5969/5/be/src/exprs/timestamp-functions.cc File be/src/exprs/timestamp-functions.cc: Line 215: int msk_transition_day = 2456956; const for these three vars -- To view, visit http://gerrit.cloudera.org:8080/5969 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id6e3f2c9f6ba29749a26bc1087e664637bc02528 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4546: Fix Moscow timezone conversion after 2014
Alex Behm has posted comments on this change. Change subject: IMPALA-4546: Fix Moscow timezone conversion after 2014 .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/5969/5/be/src/exprs/timestamp-functions.cc File be/src/exprs/timestamp-functions.cc: Line 226: tv.time().hours() < (msk_transition_hour_utc + msk_utc_offset) % 24)) { Is this correct? Before, we had tv.time().hours() < 1. Now we have 22 + 4 % 24 which is 2. -- To view, visit http://gerrit.cloudera.org:8080/5969 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id6e3f2c9f6ba29749a26bc1087e664637bc02528 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4936 and IMPALA-4915: Fix decimal overflow test
Jim Apple has posted comments on this change. Change subject: IMPALA-4936 and IMPALA-4915: Fix decimal overflow test .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/6068/1//COMMIT_MSG Commit Message: Line 22: decimal V2 clauses. Will investigate further. I think this probably should be fixed in this commit. -- To view, visit http://gerrit.cloudera.org:8080/6068 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8de7440a585c1d3d937fcbb435b9ead77e7b5a63 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Ho Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage
Jim Apple has posted comments on this change. Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/6025/5/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: PS5, Line 1122: ((double) state->source_size - r) / state->source_size I'm not yet convinced this is correct. I'm not convinced it is correct in HEAD. First, a nit: I think the denominator should be increased by 1. (I think the expected value of the maximum of two independent random variables over the uniform distribution on (0,1) is 2/3, not 3/4) Second, while I understand that this produces values in the given range, I don't think I understand how this implements the algorithm mentioned in the .h file, in which values of weight k are mapped to r^(1/k), where r is uniform random from (0,1). -- To view, visit http://gerrit.cloudera.org:8080/6025 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2020: Inline big number strings
Zach Amsden has uploaded a new patch set (#7). Change subject: IMPALA-2020: Inline big number strings .. IMPALA-2020: Inline big number strings We can directly spcecify these in the header instead of keeping the constants out in a .o. This lets the compiler inline them in a lot more places, potentially giving better behavior than a table lookup by directly inserting the constant. This in turn allows divides by now known compile time constants to be converted into multiply. Avoiding the table lookup seems to be a win for other types as well, giving anywhere from 18% to ~33% speedup over baseline. This is likely to become even more imporant with big integer divide, which requires additional computation for DECIMAL_V2. Testing: Compare the binary output of decimal-util.cc before and after; ran expr test suite. Before: [localhost:21000] > select sum(cast(cast(l_extendedprice as double) as decimal(14,2))) from tpch10_parquet.lineitem; Query: select sum(cast(cast(l_extendedprice as double) as decimal(14,2))) from tpch10_parquet.lineitem Query submitted at: 2017-02-18 01:23:54 (Coordinator: http://impala-dev:25000) Query progress can be monitored at: http://impala-dev:25000/query_plan?query_id=f0410a35e981a86b:e6d92070 +-+ | sum(cast(cast(l_extendedprice as double) as decimal(14,2))) | +-+ | 2293813121802.09| +-+ Fetched 1 row(s) in 0.83s After: [localhost:21000] > select sum(cast(cast(l_extendedprice as double) as decimal(14,2))) from tpch10_parquet.lineitem; Query: select sum(cast(cast(l_extendedprice as double) as decimal(14,2))) from tpch10_parquet.lineitem Query submitted at: 2017-02-18 02:23:33 (Coordinator: http://impala-dev:25000) Query progress can be monitored at: http://impala-dev:25000/query_plan?query_id=97438c98a3d950a4:1a45eeb9 +-+ | sum(cast(cast(l_extendedprice as double) as decimal(14,2))) | +-+ | 2293813121802.09| +-+ Fetched 1 row(s) in 0.63s Change-Id: I5095a366d914cebb0b64bd434a08dbb55c90ed30 --- M be/src/benchmarks/overflow-benchmark.cc M be/src/common/init.cc M be/src/exprs/expr-value.h M be/src/runtime/decimal-test.cc M be/src/util/CMakeLists.txt D be/src/util/decimal-util.cc M be/src/util/decimal-util.h 7 files changed, 19 insertions(+), 50 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/5902/7 -- To view, visit http://gerrit.cloudera.org:8080/5902 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5095a366d914cebb0b64bd434a08dbb55c90ed30 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-4936 and IMPALA-4915: Fix decimal overflow test
Zach Amsden has uploaded a new patch set (#2). Change subject: IMPALA-4936 and IMPALA-4915: Fix decimal overflow test .. IMPALA-4936 and IMPALA-4915: Fix decimal overflow test The root cause of both behaviors turned out to be the same - a missing std:: caused the wrong abs() function to be used. Due to details of IEEE floating point representation, this actually masked another bug, as NaN is often represented as all 1-bits, which fails the overflow test. Since the implicit conversion to int lost precision, we ended up storing large numbers that don't actually represent valid decimal numbers in the range when the value happened to be +/- Infinity. This caused the rendering back to ASCII characters to go awry, but is otherwise harmless. Testing: updated expr test with correctly fixed test cases. For some reason, it seemed the test passed before that, and I found that it wasn't correctly validating some of the NULL cases. Change-Id: I8de7440a585c1d3d937fcbb435b9ead77e7b5a63 --- M be/src/exprs/expr-test.cc M be/src/runtime/decimal-value.inline.h 2 files changed, 14 insertions(+), 15 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/6068/2 -- To view, visit http://gerrit.cloudera.org:8080/6068 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8de7440a585c1d3d937fcbb435b9ead77e7b5a63 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Ho
[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage
Jim Apple has posted comments on this change. Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/6025/5/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: Line 1121: int r = rand() % state->num_samples; I know you didn't author this line, but could you switch to a better PRNG than rand, like http://en.cppreference.com/w/cpp/numeric/random/mersenne_twister_engine? -- To view, visit http://gerrit.cloudera.org:8080/6025 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage
Taras Bobrovytsky has uploaded a new patch set (#5). Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage .. IMPALA-4787: Optimize APPX_MEDIAN() memory usage Before this change, ReservoirSample functions (such as APPX_MEDIAN()) allocated memory for 20,000 elements up front per grouping key. This caused inefficient memory usage for aggregations with many grouping keys. This patch fixes this by initially allocating memory for 16 elements. Once the buffer becomes full, we reallocate a new buffer with double capacity and copy the original buffer into the new one. We continue doubling the buffer size until the buffer has room for 20,000 elements as before. Testing: Ran BE and some relevant EE tests locally. No new tests were added, the existing tests should provide enough coverage. Performance benchmark: I ran a performance benchmark locally on release build. The following query results in 3,000 grouping keys and about 30,000 values per key: SELECT MAX(a) from ( SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t Before: 11.57s Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail - 06:AGGREGATE1 96.086us 96.086us 1 1 28.00 KB -1.00 B FINALIZE 05:EXCHANGE 1 26.629us 26.629us 3 1 0 -1.00 B UNPARTITIONED 02:AGGREGATE3 68.187us 87.887us 3 1 44.00 KB 10.00 MB 04:AGGREGATE32s851ms5s362ms 3.00K -1 1.95 GB 128.00 MB FINALIZE 03:EXCHANGE 3 119.540ms 220.191ms 9.00K -1 0 0 HASH(c1) 01:AGGREGATE35s876ms6s254ms 9.00K -1 2.93 GB 128.00 MB STREAMING 00:SCAN HDFS3 127.834ms 146.842ms 98.30M -1 19.80 MB 32.00 MB tpcds_10_parquet.benchmark After: 13.58s Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail --- 06:AGGREGATE1 101.101us 101.101us 1 1 28.00 KB -1.00 B FINALIZE 05:EXCHANGE 1 32.296us 32.296us 3 1 0 -1.00 B UNPARTITIONED 02:AGGREGATE3 83.284us 120.137us 3 1 44.00 KB 10.00 MB 04:AGGREGATE33s190ms6s555ms 3.00K -1 1.96 GB 128.00 MB FINALIZE 03:EXCHANGE 3 247.897ms 497.280ms 9.00K -1 0 0 HASH(c1) 01:AGGREGATE37s370ms8s460ms 9.00K -1 4.71 GB 128.00 MB STREAMING 00:SCAN HDFS3 111.721ms 122.306ms 98.30M -1 19.94 MB 32.00 MB tpcds_10_parquet.benchmark Memory benchmark: The following query was used to verify that this patch reduces memory usage: SELECT APPX_MEDIAN(ss_sold_date_sk) FROM tpcds.store_sales GROUP BY ss_customer_sk; Peak Mem in Agg nodes is reduced from 4.94 GB to 19.82 MB. Summary before: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail 04:EXCHANGE 15.856ms5.856ms 14.82K 15.21K 0 -1.00 B UNPARTITIONED 03:AGGREGATE33s721ms3s789ms 14.82K 15.21K 4.94 GB 10.00 MB FINALIZE 02:EXCHANGE 3 139.276ms 157.753ms 15.60K 15.21K 0 0 HASH(ss_customer_sk) 01:AGGREGATE32s851ms3s026ms 15.60K 15.21K 5.29 GB 10.00 MB STREAMING 00:SCAN HDFS3 24.245ms 35.727ms 183.59K 183.59K 4.60 MB 384.00 MB tpcds.store_sales Summary after: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail 04:EXCHANGE 1 288.125us 288.125us 14.82K 15.21K 0 -1.00 B UNPARTITIONED 03:AGGREGATE39.358ms 10.982ms 14.82K 15.21K 19.82 MB 10.00 MB FINALIZE 02:EXCHANGE 3 129.832us 154.953us 15.62K 15.21K 0 0 HASH(ss_customer_sk) 01:AGGREGATE3 11.086ms 13.102ms 15.62K 15.21K 9.49 MB 10.00 MB STREAMING 00:SCAN HDFS3 40.154ms 50.220ms 183.59K 183.59K 2.94 MB 384.00 MB tpcds.store_sales Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 --- M be/src/exprs/aggregate-functions-ir.cc M testdata/workloads/functional-query/queries/QueryTest/alloc-fail-init.test 2 files changed, 147 insertions(+), 34
[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage
Taras Bobrovytsky has uploaded a new patch set (#5). Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage .. IMPALA-4787: Optimize APPX_MEDIAN() memory usage Before this change, ReservoirSample functions (such as APPX_MEDIAN()) allocated memory for 20,000 elements up front per grouping key. This caused inefficient memory usage for aggregations with many grouping keys. This patch fixes this by initially allocating memory for 16 elements. Once the buffer becomes full, we reallocate a new buffer with double capacity and copy the original buffer into the new one. We continue doubling the buffer size until the buffer has room for 20,000 elements as before. Testing: Ran BE and some relevant EE tests locally. No new tests were added, the existing tests should provide enough coverage. Performance benchmark: I ran a performance benchmark locally on release build. The following query results in 3,000 grouping keys and about 30,000 values per key: SELECT MAX(a) from ( SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t Before: 11.57s Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail - 06:AGGREGATE1 96.086us 96.086us 1 1 28.00 KB -1.00 B FINALIZE 05:EXCHANGE 1 26.629us 26.629us 3 1 0 -1.00 B UNPARTITIONED 02:AGGREGATE3 68.187us 87.887us 3 1 44.00 KB 10.00 MB 04:AGGREGATE32s851ms5s362ms 3.00K -1 1.95 GB 128.00 MB FINALIZE 03:EXCHANGE 3 119.540ms 220.191ms 9.00K -1 0 0 HASH(c1) 01:AGGREGATE35s876ms6s254ms 9.00K -1 2.93 GB 128.00 MB STREAMING 00:SCAN HDFS3 127.834ms 146.842ms 98.30M -1 19.80 MB 32.00 MB tpcds_10_parquet.benchmark After: 13.58s Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail --- 06:AGGREGATE1 101.101us 101.101us 1 1 28.00 KB -1.00 B FINALIZE 05:EXCHANGE 1 32.296us 32.296us 3 1 0 -1.00 B UNPARTITIONED 02:AGGREGATE3 83.284us 120.137us 3 1 44.00 KB 10.00 MB 04:AGGREGATE33s190ms6s555ms 3.00K -1 1.96 GB 128.00 MB FINALIZE 03:EXCHANGE 3 247.897ms 497.280ms 9.00K -1 0 0 HASH(c1) 01:AGGREGATE37s370ms8s460ms 9.00K -1 4.71 GB 128.00 MB STREAMING 00:SCAN HDFS3 111.721ms 122.306ms 98.30M -1 19.94 MB 32.00 MB tpcds_10_parquet.benchmark Memory benchmark: The following query was used to verify that this patch reduces memory usage: SELECT APPX_MEDIAN(ss_sold_date_sk) FROM tpcds.store_sales GROUP BY ss_customer_sk; Peak Mem in Agg nodes is reduced from 4.94 GB to 19.82 MB. Summary before: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail 04:EXCHANGE 15.856ms5.856ms 14.82K 15.21K 0 -1.00 B UNPARTITIONED 03:AGGREGATE33s721ms3s789ms 14.82K 15.21K 4.94 GB 10.00 MB FINALIZE 02:EXCHANGE 3 139.276ms 157.753ms 15.60K 15.21K 0 0 HASH(ss_customer_sk) 01:AGGREGATE32s851ms3s026ms 15.60K 15.21K 5.29 GB 10.00 MB STREAMING 00:SCAN HDFS3 24.245ms 35.727ms 183.59K 183.59K 4.60 MB 384.00 MB tpcds.store_sales Summary after: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail 04:EXCHANGE 1 288.125us 288.125us 14.82K 15.21K 0 -1.00 B UNPARTITIONED 03:AGGREGATE39.358ms 10.982ms 14.82K 15.21K 19.82 MB 10.00 MB FINALIZE 02:EXCHANGE 3 129.832us 154.953us 15.62K 15.21K 0 0 HASH(ss_customer_sk) 01:AGGREGATE3 11.086ms 13.102ms 15.62K 15.21K 9.49 MB 10.00 MB STREAMING 00:SCAN HDFS3 40.154ms 50.220ms 183.59K 183.59K 2.94 MB 384.00 MB tpcds.store_sales Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 --- M be/src/exprs/aggregate-functions-ir.cc M testdata/workloads/functional-query/queries/QueryTest/alloc-fail-init.test 2 files changed, 147 insertions(+), 34
[Impala-ASF-CR] IMPALA-4936 and IMPALA-4915: Fix decimal overflow test
Zach Amsden has uploaded a new change for review. http://gerrit.cloudera.org:8080/6068 Change subject: IMPALA-4936 and IMPALA-4915: Fix decimal overflow test .. IMPALA-4936 and IMPALA-4915: Fix decimal overflow test The root cause of both behaviors turned out to be the same - a missing std:: caused the wrong abs() function to be used. Due to details of IEEE floating point representation, this actually masked another bug, as NaN is often represented as all 1-bits, which fails the overflow test. Since the implicit conversion to int lost precision, we ended up storing large numbers that don't actually represent valid decimal numbers in the range when the value happened to be +/- Infinity. This caused the rendering back to ASCII characters to go awry, but is otherwise harmless. Testing: updated expr test with correctly fixed test cases. For some reason, it seemed the test passed before that, so it is possible that it isn't correctly validating some of the non decimal V2 clauses. Will investigate further. Change-Id: I8de7440a585c1d3d937fcbb435b9ead77e7b5a63 --- M be/src/exprs/expr-test.cc M be/src/runtime/decimal-value.inline.h 2 files changed, 7 insertions(+), 13 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/6068/1 -- To view, visit http://gerrit.cloudera.org:8080/6068 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I8de7440a585c1d3d937fcbb435b9ead77e7b5a63 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden
[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage .. Patch Set 4: (6 comments) http://gerrit.cloudera.org:8080/#/c/6025/4/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: Line 1025: // can fit 10 times as many samples, up to a maximum of MAX_NUM_SAMPLES. We do not > update references to '10' Done Line 1026: // allocate room for MAX_NUM_SAMPLES samples up front in order to conserve memory. > in order to -> to Done Line 1031: if (new_capacity * 2 >= MAX_NUM_SAMPLES) new_capacity = MAX_NUM_SAMPLES; > DCHECK_LT(state->capacity, new_capacity); Done Line 1051: // ReservoirSamples. Initially, the array has enough room for MAX_NUM_SAMPLES / 1000 > update to reflect the new initial size Done Line 1063: *state = ReservoirSampleState(); > It would be clearer what's happening if we added a constructor for Reservoi Done Line 1076: // If the container is full, increase it's size by 10x. > update 10 Done -- To view, visit http://gerrit.cloudera.org:8080/6025 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4926: Upgrade LZ4 to 1.7.5
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4926: Upgrade LZ4 to 1.7.5 .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6030 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I10e4561d0e940fa15ca8248c8277acfc6dff3da3 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry RobinsonGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4546: Fix Moscow timezone conversion after 2014
Taras Bobrovytsky has uploaded a new patch set (#5). Change subject: IMPALA-4546: Fix Moscow timezone conversion after 2014 .. IMPALA-4546: Fix Moscow timezone conversion after 2014 In 2014 Moscow timezone rules changed from UTC+3 with no DST to UTC+4 with no DST. A special case has been added to timestamp functions to handle this. Testing: Added BE Expr tests. Change-Id: Id6e3f2c9f6ba29749a26bc1087e664637bc02528 --- M be/src/exprs/expr-test.cc M be/src/exprs/timestamp-functions.cc M be/src/exprs/timezone_db.cc M be/src/exprs/timezone_db.h 4 files changed, 114 insertions(+), 44 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/5969/5 -- To view, visit http://gerrit.cloudera.org:8080/5969 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id6e3f2c9f6ba29749a26bc1087e664637bc02528 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-4546: Fix Moscow timezone conversion after 2014
Taras Bobrovytsky has uploaded a new patch set (#5). Change subject: IMPALA-4546: Fix Moscow timezone conversion after 2014 .. IMPALA-4546: Fix Moscow timezone conversion after 2014 In 2014 Moscow timezone rules changed from UTC+3 with no DST to UTC+4 with no DST. A special case has been added to timestamp functions to handle this. Testing: Added BE Expr tests. Change-Id: Id6e3f2c9f6ba29749a26bc1087e664637bc02528 --- M be/src/exprs/expr-test.cc M be/src/exprs/timestamp-functions.cc M be/src/exprs/timezone_db.cc M be/src/exprs/timezone_db.h 4 files changed, 114 insertions(+), 44 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/5969/5 -- To view, visit http://gerrit.cloudera.org:8080/5969 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id6e3f2c9f6ba29749a26bc1087e664637bc02528 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-4546: Fix Moscow timezone conversion after 2014
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-4546: Fix Moscow timezone conversion after 2014 .. Patch Set 4: (3 comments) http://gerrit.cloudera.org:8080/#/c/5969/2/be/src/exprs/timestamp-functions.cc File be/src/exprs/timestamp-functions.cc: Line 195: // March 27, 2011 Moscow time transitioned to UTC+4 with no DST. NOTE: We currently > we do not handle them (error/NULL?), or do we not handle them correctly? We do not handle correctly (see patch 4) Line 199: return TIMEZONE_MSK_PRE_2011_DST; > mention the time also 22:00:000 UTC because we check for that below Done Line 209: } > might be clearer to have const variable here int transition_hour_utc = 22 a Done -- To view, visit http://gerrit.cloudera.org:8080/5969 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id6e3f2c9f6ba29749a26bc1087e664637bc02528 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2020: Inline big number strings
Jim Apple has posted comments on this change. Change subject: IMPALA-2020: Inline big number strings .. Patch Set 5: (8 comments) http://gerrit.cloudera.org:8080/#/c/5902/5//COMMIT_MSG Commit Message: PS5, Line 27: Query: select sum(cast(cast(l_extendedprice as double) as : decimal(14,2))) from tpch10_parquet.lineitem : Query submitted at: 2017-02-18 01:23:54 (Coordinator: : http://impala-dev:25000) : Query progress can be monitored at: : http://impala-dev:25000/query_plan?query_id=f0410a35e981a86b:e6d92070 elide PS5, Line 38: 0.83s error bars? http://gerrit.cloudera.org:8080/#/c/5902/5/be/src/exprs/expr-value.h File be/src/exprs/expr-value.h: Line 177: decimal4_val = +DecimalUtil::MAX_UNSCALED_DECIMAL4; > Unary plus necessary to convert to an Rvalue Why do you want to convert it to an Rvalue? Just to avoid having to define it in a .cc file? If so, does defining (not declaring or initializing) in a .cc file slow down performance? Also, add that comment in the file for future readers. http://gerrit.cloudera.org:8080/#/c/5902/5/be/src/util/decimal-util.h File be/src/util/decimal-util.h: Line 57: template Describe what it does. Maybe call it Pow10 and call the parameter exponent? private: PS5, Line 58: T Always is_integral? PS5, Line 59: boost:: static_assert is now part of the language, but I'm not sure if it works on parameters, since parameters can't be constexpr. Line 60: return scale == 0 ? 1 : 10 * GetConstScaleMultiplier(scale-1); Check for no overflow, if it can be done simply. Line 136: if (__builtin_const_p(scale) && p < 10) return GetConstScaleMultiplier(scale); Does this by itself speed anything up? If scale is constant at compile time, since this is inlined, won't it just inline to the right value? https://godbolt.org/g/Tlfl8l -- To view, visit http://gerrit.cloudera.org:8080/5902 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5095a366d914cebb0b64bd434a08dbb55c90ed30 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage
Alex Behm has posted comments on this change. Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage .. Patch Set 4: (6 comments) I'm pretty happy with this patch. http://gerrit.cloudera.org:8080/#/c/6025/4/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: Line 1025: // can fit 10 times as many samples, up to a maximum of MAX_NUM_SAMPLES. We do not update references to '10' Line 1026: // allocate room for MAX_NUM_SAMPLES samples up front in order to conserve memory. in order to -> to Line 1031: if (new_capacity * 2 >= MAX_NUM_SAMPLES) new_capacity = MAX_NUM_SAMPLES; DCHECK_LT(state->capacity, new_capacity); in case someone decides to significantly muck with the sampling constants such that this can overflow Line 1051: // ReservoirSamples. Initially, the array has enough room for MAX_NUM_SAMPLES / 1000 update to reflect the new initial size Line 1063: *state = ReservoirSampleState(); It would be clearer what's happening if we added a constructor for ReservoirSampleState that initializes the struct fields. Right now, it appears that num_samples and source_size could be uninitialized. Line 1076: // If the container is full, increase it's size by 10x. update 10 -- To view, visit http://gerrit.cloudera.org:8080/6025 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2020: Inline big number strings
Zach Amsden has posted comments on this change. Change subject: IMPALA-2020: Inline big number strings .. Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/5902/5/be/src/exprs/expr-value.h File be/src/exprs/expr-value.h: Line 177: decimal4_val = +DecimalUtil::MAX_UNSCALED_DECIMAL4; Unary plus necessary to convert to an Rvalue http://gerrit.cloudera.org:8080/#/c/5902/4/be/src/util/decimal-util.cc File be/src/util/decimal-util.cc: Line 25 > If it's going to be this specific, why not Let's go full tilt and actually fully inline this :) -- To view, visit http://gerrit.cloudera.org:8080/5902 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5095a366d914cebb0b64bd434a08dbb55c90ed30 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2020: Inline big number strings
Zach Amsden has uploaded a new patch set (#5). Change subject: IMPALA-2020: Inline big number strings .. IMPALA-2020: Inline big number strings We can directly spcecify these in the header instead of keeping the constants out in a .o. This lets the compiler inline them in a lot more places, potentially giving better behavior than a table lookup by directly inserting the constant. This in turn allows divides by now known compile time constants to be converted into multiply. Avoiding the table lookup seems to be a win for other types as well, giving anywhere from 18% to ~33% speedup over baseline. This is likely to become even more imporant with big integer divide, which requires additional computation for DECIMAL_V2. Testing: Compare the binary output of decimal-util.cc before and after; ran expr test suite. Before: [localhost:21000] > select sum(cast(cast(l_extendedprice as double) as decimal(14,2))) from tpch10_parquet.lineitem; Query: select sum(cast(cast(l_extendedprice as double) as decimal(14,2))) from tpch10_parquet.lineitem Query submitted at: 2017-02-18 01:23:54 (Coordinator: http://impala-dev:25000) Query progress can be monitored at: http://impala-dev:25000/query_plan?query_id=f0410a35e981a86b:e6d92070 +-+ | sum(cast(cast(l_extendedprice as double) as decimal(14,2))) | +-+ | 2293813121802.09| +-+ Fetched 1 row(s) in 0.83s After: [localhost:21000] > select sum(cast(cast(l_extendedprice as double) as decimal(14,2))) from tpch10_parquet.lineitem; Query: select sum(cast(cast(l_extendedprice as double) as decimal(14,2))) from tpch10_parquet.lineitem Query submitted at: 2017-02-18 02:23:33 (Coordinator: http://impala-dev:25000) Query progress can be monitored at: http://impala-dev:25000/query_plan?query_id=97438c98a3d950a4:1a45eeb9 +-+ | sum(cast(cast(l_extendedprice as double) as decimal(14,2))) | +-+ | 2293813121802.09| +-+ Fetched 1 row(s) in 0.63s Change-Id: I5095a366d914cebb0b64bd434a08dbb55c90ed30 --- M be/src/benchmarks/overflow-benchmark.cc M be/src/common/init.cc M be/src/exprs/expr-value.h M be/src/runtime/decimal-test.cc M be/src/util/CMakeLists.txt D be/src/util/decimal-util.cc M be/src/util/decimal-util.h 7 files changed, 20 insertions(+), 50 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/5902/5 -- To view, visit http://gerrit.cloudera.org:8080/5902 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5095a366d914cebb0b64bd434a08dbb55c90ed30 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-4934: Disable Kudu OpenSSL initialization
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4934: Disable Kudu OpenSSL initialization .. Patch Set 4: Verified-1 Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/284/ -- To view, visit http://gerrit.cloudera.org:8080/6056 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3f13f3af512c6d771979638da593685524c73086 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage
Alex Behm has posted comments on this change. Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/6025/2/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: PS2, Line 1024: new_size > Queries with a large number buckets will run out of memory much quicker tha 2x works for me but: For buckets with many samples you spend more time reallocating, and the peak memory consumption can be higher (space for 1.5 * MAX_NUM_SAMPLES). See the experiment Taras posted in the commit msg. I'm fine with the 2x behavior, just pointing out the tradeoffs. -- To view, visit http://gerrit.cloudera.org:8080/6025 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage
Taras Bobrovytsky has uploaded a new patch set (#4). Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage .. IMPALA-4787: Optimize APPX_MEDIAN() memory usage Before this change, ReservoirSample functions (such as APPX_MEDIAN()) allocated memory for 20,000 elements up front per grouping key. This caused inefficient memory usage for aggregations with many grouping keys. This patch fixes this by initially allocating memory for 20 elements. Once the buffer becomes full, we allocate room for 200 elements and copy the original buffer into the new one. We continue increasing the buffer size this way until the buffer has room for 20,000 elements as before. Testing: Ran BE and some relevant EE tests locally. No new tests were added, the existing tests should provide enough coverage. Performance benchmark: I ran a performance benchmark locally on release build. The following query results in 3,000 grouping keys and about 30,000 values per key: SELECT MAX(a) from ( SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t Before: 11.57s Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail - 06:AGGREGATE1 96.086us 96.086us 1 1 28.00 KB -1.00 B FINALIZE 05:EXCHANGE 1 26.629us 26.629us 3 1 0 -1.00 B UNPARTITIONED 02:AGGREGATE3 68.187us 87.887us 3 1 44.00 KB 10.00 MB 04:AGGREGATE32s851ms5s362ms 3.00K -1 1.95 GB 128.00 MB FINALIZE 03:EXCHANGE 3 119.540ms 220.191ms 9.00K -1 0 0 HASH(c1) 01:AGGREGATE35s876ms6s254ms 9.00K -1 2.93 GB 128.00 MB STREAMING 00:SCAN HDFS3 127.834ms 146.842ms 98.30M -1 19.80 MB 32.00 MB tpcds_10_parquet.benchmark After: 13.58s Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail --- 06:AGGREGATE1 101.101us 101.101us 1 1 28.00 KB -1.00 B FINALIZE 05:EXCHANGE 1 32.296us 32.296us 3 1 0 -1.00 B UNPARTITIONED 02:AGGREGATE3 83.284us 120.137us 3 1 44.00 KB 10.00 MB 04:AGGREGATE33s190ms6s555ms 3.00K -1 1.96 GB 128.00 MB FINALIZE 03:EXCHANGE 3 247.897ms 497.280ms 9.00K -1 0 0 HASH(c1) 01:AGGREGATE37s370ms8s460ms 9.00K -1 4.71 GB 128.00 MB STREAMING 00:SCAN HDFS3 111.721ms 122.306ms 98.30M -1 19.94 MB 32.00 MB tpcds_10_parquet.benchmark Memory benchmark: The following query was used to verify that this patch reduces memory usage: SELECT APPX_MEDIAN(ss_sold_date_sk) FROM tpcds.store_sales GROUP BY ss_customer_sk; Peak Mem in Agg nodes is reduced from 4.94 GB to 19.82 MB. Summary before: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail 04:EXCHANGE 15.856ms5.856ms 14.82K 15.21K 0 -1.00 B UNPARTITIONED 03:AGGREGATE33s721ms3s789ms 14.82K 15.21K 4.94 GB 10.00 MB FINALIZE 02:EXCHANGE 3 139.276ms 157.753ms 15.60K 15.21K 0 0 HASH(ss_customer_sk) 01:AGGREGATE32s851ms3s026ms 15.60K 15.21K 5.29 GB 10.00 MB STREAMING 00:SCAN HDFS3 24.245ms 35.727ms 183.59K 183.59K 4.60 MB 384.00 MB tpcds.store_sales Summary after: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail 04:EXCHANGE 1 288.125us 288.125us 14.82K 15.21K 0 -1.00 B UNPARTITIONED 03:AGGREGATE39.358ms 10.982ms 14.82K 15.21K 19.82 MB 10.00 MB FINALIZE 02:EXCHANGE 3 129.832us 154.953us 15.62K 15.21K 0 0 HASH(ss_customer_sk) 01:AGGREGATE3 11.086ms 13.102ms 15.62K 15.21K 9.49 MB 10.00 MB STREAMING 00:SCAN HDFS3 40.154ms 50.220ms 183.59K 183.59K 2.94 MB 384.00 MB tpcds.store_sales Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 --- M be/src/exprs/aggregate-functions-ir.cc M testdata/workloads/functional-query/queries/QueryTest/alloc-fail-init.test 2 files changed, 141 insertions(+), 33
[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage
Taras Bobrovytsky has uploaded a new patch set (#4). Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage .. IMPALA-4787: Optimize APPX_MEDIAN() memory usage Before this change, ReservoirSample functions (such as APPX_MEDIAN()) allocated memory for 20,000 elements up front per grouping key. This caused inefficient memory usage for aggregations with many grouping keys. This patch fixes this by initially allocating memory for 20 elements. Once the buffer becomes full, we allocate room for 200 elements and copy the original buffer into the new one. We continue increasing the buffer size this way until the buffer has room for 20,000 elements as before. Testing: Ran BE and some relevant EE tests locally. No new tests were added, the existing tests should provide enough coverage. Performance benchmark: I ran a performance benchmark locally on release build. The following query results in 3,000 grouping keys and about 30,000 values per key: SELECT MAX(a) from ( SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t Before: 11.57s Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail - 06:AGGREGATE1 96.086us 96.086us 1 1 28.00 KB -1.00 B FINALIZE 05:EXCHANGE 1 26.629us 26.629us 3 1 0 -1.00 B UNPARTITIONED 02:AGGREGATE3 68.187us 87.887us 3 1 44.00 KB 10.00 MB 04:AGGREGATE32s851ms5s362ms 3.00K -1 1.95 GB 128.00 MB FINALIZE 03:EXCHANGE 3 119.540ms 220.191ms 9.00K -1 0 0 HASH(c1) 01:AGGREGATE35s876ms6s254ms 9.00K -1 2.93 GB 128.00 MB STREAMING 00:SCAN HDFS3 127.834ms 146.842ms 98.30M -1 19.80 MB 32.00 MB tpcds_10_parquet.benchmark After: 13.58s Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail --- 06:AGGREGATE1 101.101us 101.101us 1 1 28.00 KB -1.00 B FINALIZE 05:EXCHANGE 1 32.296us 32.296us 3 1 0 -1.00 B UNPARTITIONED 02:AGGREGATE3 83.284us 120.137us 3 1 44.00 KB 10.00 MB 04:AGGREGATE33s190ms6s555ms 3.00K -1 1.96 GB 128.00 MB FINALIZE 03:EXCHANGE 3 247.897ms 497.280ms 9.00K -1 0 0 HASH(c1) 01:AGGREGATE37s370ms8s460ms 9.00K -1 4.71 GB 128.00 MB STREAMING 00:SCAN HDFS3 111.721ms 122.306ms 98.30M -1 19.94 MB 32.00 MB tpcds_10_parquet.benchmark Memory benchmark: The following query was used to verify that this patch reduces memory usage: SELECT APPX_MEDIAN(ss_sold_date_sk) FROM tpcds.store_sales GROUP BY ss_customer_sk; Peak Mem in Agg nodes is reduced from 4.94 GB to 19.82 MB. Summary before: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail 04:EXCHANGE 15.856ms5.856ms 14.82K 15.21K 0 -1.00 B UNPARTITIONED 03:AGGREGATE33s721ms3s789ms 14.82K 15.21K 4.94 GB 10.00 MB FINALIZE 02:EXCHANGE 3 139.276ms 157.753ms 15.60K 15.21K 0 0 HASH(ss_customer_sk) 01:AGGREGATE32s851ms3s026ms 15.60K 15.21K 5.29 GB 10.00 MB STREAMING 00:SCAN HDFS3 24.245ms 35.727ms 183.59K 183.59K 4.60 MB 384.00 MB tpcds.store_sales Summary after: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail 04:EXCHANGE 1 288.125us 288.125us 14.82K 15.21K 0 -1.00 B UNPARTITIONED 03:AGGREGATE39.358ms 10.982ms 14.82K 15.21K 19.82 MB 10.00 MB FINALIZE 02:EXCHANGE 3 129.832us 154.953us 15.62K 15.21K 0 0 HASH(ss_customer_sk) 01:AGGREGATE3 11.086ms 13.102ms 15.62K 15.21K 9.49 MB 10.00 MB STREAMING 00:SCAN HDFS3 40.154ms 50.220ms 183.59K 183.59K 2.94 MB 384.00 MB tpcds.store_sales Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 --- M be/src/exprs/aggregate-functions-ir.cc M testdata/workloads/functional-query/queries/QueryTest/alloc-fail-init.test 2 files changed, 141 insertions(+), 33
[Impala-ASF-CR] IMPALA-3401 [DOCS] Phase 3 of removing Cloudera Manager from upstream docs. Hid instances of CM and rewrote for upstream docs when necessary. This still leaves occurences of CM in the
Laurel Hale has uploaded a new change for review. http://gerrit.cloudera.org:8080/6067 Change subject: IMPALA-3401 [DOCS] Phase 3 of removing Cloudera Manager from upstream docs. Hid instances of CM and rewrote for upstream docs when necessary. This still leaves occurences of CM in the XML, but not in the rendered documentation. A later project will remove .. IMPALA-3401 [DOCS] Phase 3 of removing Cloudera Manager from upstream docs. Hid instances of CM and rewrote for upstream docs when necessary. This still leaves occurences of CM in the XML, but not in the rendered documentation. A later project will remove all occurrences of CM from the XML. Change-Id: I4748300edc43b7071afc50e7cc7ddd64120c0d8d --- M docs/topics/impala_impala_shell.xml M docs/topics/impala_proxy.xml M docs/topics/impala_resource_management.xml M docs/topics/impala_timeouts.xml M docs/topics/impala_udf.xml 5 files changed, 95 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/6067/1 -- To view, visit http://gerrit.cloudera.org:8080/6067 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I4748300edc43b7071afc50e7cc7ddd64120c0d8d Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Laurel Hale
[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts
Zach Amsden has uploaded a new patch set (#15). Change subject: IMPALA-2020: Add rounding for decimal casts .. IMPALA-2020: Add rounding for decimal casts This change adds support for DECIMAL_V2 rounding behavior for both DECIMAL to INT and DOUBLE to DECIMAL casts. The round behavior implemented for exact halves is round halves away from zero (e.g (0.5 -> 1) and (-0.5 -> -1)). Testing: Added expr-test and decimal-test test coverage as well as manual testing. I tried to update the expr benchmark to get some kind of results but the benchmark is pretty bit-rotted. It was throwing JNI exceptions. Fixed up the JNI init call, but there is still a lot of work to do to get this back in a runnable state. Even with the hack to get at the RuntimeContext, we end up getting null derefs due to the slot descriptor table not being initialized. I have decided to wait on expanding the python test until the bugs with overflow are fixed, which will make it easier to test sane behavior. [localhost:21000] > select cast(0.5 AS int); +--+ | cast(0.5 as int) | +--+ | 0| +--+ Fetched 1 row(s) in 0.01s [localhost:21000] > select cast(cast(0.5999 as float) as decimal(5,1)); +-+ | cast(cast(0.5999 as float) as decimal(5,1)) | +-+ | 0.5 | +-+ Fetched 1 row(s) in 0.01s [localhost:21000] > set decimal_v2=1; DECIMAL_V2 set to 1 [localhost:21000] > select cast(0.5 AS int); +--+ | cast(0.5 as int) | +--+ | 1| +--+ Fetched 1 row(s) in 0.01s [localhost:21000] > select cast(cast(0.5999 as float) as decimal(5,1)); +-+ | cast(cast(0.5999 as float) as decimal(5,1)) | +-+ | 0.6 | +-+ Fetched 1 row(s) in 0.01s Note there is no free lunch: [localhost:21000] > set decimal_v2=0; DECIMAL_V2 set to 0 [localhost:21000] > select sum(cast(l_extendedprice as bigint)) from tpch10_parquet.lineitem; Query: select sum(cast(l_extendedprice as bigint)) from tpch10_parquet.lineitem Query submitted at: 2017-02-18 01:16:49 (Coordinator: http://impala-dev:25000) Query progress can be monitored at: http://impala-dev:25000/query_plan?query_id=c546f270316b2176:820bb667 +--+ | sum(cast(l_extendedprice as bigint)) | +--+ | 2293784575265| +--+ Fetched 1 row(s) in 0.76s [localhost:21000] > select sum(cast(l_extendedprice as bigint)) from tpch10_parquet.lineitem; Query: select sum(cast(l_extendedprice as bigint)) from tpch10_parquet.lineitem Query submitted at: 2017-02-18 01:16:52 (Coordinator: http://impala-dev:25000) Query progress can be monitored at: http://impala-dev:25000/query_plan?query_id=524bf2693849ce99:be999b03 +--+ | sum(cast(l_extendedprice as bigint)) | +--+ | 2293784575265| +--+ Fetched 1 row(s) in 0.73s [localhost:21000] > set decimal_v2=1; DECIMAL_V2 set to 1 [localhost:21000] > select sum(cast(l_extendedprice as bigint)) from tpch10_parquet.lineitem; Query: select sum(cast(l_extendedprice as bigint)) from tpch10_parquet.lineitem Query submitted at: 2017-02-18 01:16:59 (Coordinator: http://impala-dev:25000) Query progress can be monitored at: http://impala-dev:25000/query_plan?query_id=ca4f8413061576d9:2389ccde +--+ | sum(cast(l_extendedprice as bigint)) | +--+ | 2293814088985| +--+ Fetched 1 row(s) in 0.85s [localhost:21000] > select sum(cast(l_extendedprice as bigint)) from tpch10_parquet.lineitem; Query: select sum(cast(l_extendedprice as bigint)) from tpch10_parquet.lineitem Query submitted at: 2017-02-18 01:17:02 (Coordinator: http://impala-dev:25000) Query progress can be monitored at: http://impala-dev:25000/query_plan?query_id=4d4b5f9f181306c7:f6726c6 +--+ | sum(cast(l_extendedprice as bigint)) | +--+ | 2293814088985| +--+ Fetched 1 row(s) in 0.96s So we're about 20% slower. The variance is quite a lot so this is not a scientific number, but the trend is maintained. So we have some work to do to get this back. Casting from double seems to be roughly at parity: [localhost:21000] > set decimal_v2=0;
[Impala-ASF-CR] Removing 'Cloudera Manager' from upstream docs. Used DITA attribute 'audience=hidden' when needed to preserve information for a future clean-up step. When it made sense, 'Cloudera Mana
Laurel Hale has posted comments on this change. Change subject: Removing 'Cloudera Manager' from upstream docs. Used DITA attribute 'audience=hidden' when needed to preserve information for a future clean-up step. When it made sense, 'Cloudera Manager' was removed and sections were rewritten. This clean-up task does n .. Patch Set 1: Forgot to prepend "IMPALA-3401 [DOCS]. Will not forget next time. -- To view, visit http://gerrit.cloudera.org:8080/6064 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I76c9b53f587bc85c5c21e195f0a771183d4ef3a0 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Laurel HaleGerrit-Reviewer: Laurel Hale Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4926: Upgrade LZ4 to 1.7.5
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4926: Upgrade LZ4 to 1.7.5 .. Patch Set 1: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/285/ -- To view, visit http://gerrit.cloudera.org:8080/6030 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I10e4561d0e940fa15ca8248c8277acfc6dff3da3 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry RobinsonGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-HasComments: No
[Impala-ASF-CR] Removing 'Cloudera Manager' from upstream docs. Used DITA attribute 'audience=hidden' when needed to preserve information for a future clean-up step. When it made sense, 'Cloudera Mana
Laurel Hale has uploaded a new change for review. http://gerrit.cloudera.org:8080/6064 Change subject: Removing 'Cloudera Manager' from upstream docs. Used DITA attribute 'audience=hidden' when needed to preserve information for a future clean-up step. When it made sense, 'Cloudera Manager' was removed and sections were rewritten. This clean-up task does n .. Removing 'Cloudera Manager' from upstream docs. Used DITA attribute 'audience=hidden' when needed to preserve information for a future clean-up step. When it made sense, 'Cloudera Manager' was removed and sections were rewritten. This clean-up task does not completely remove 'Cloudera Manager' from the upstream docs, but it is now hidden in these files so it won't show up in the rendered version. Change-Id: I76c9b53f587bc85c5c21e195f0a771183d4ef3a0 --- M docs/topics/impala_admission.xml M docs/topics/impala_config_performance.xml M docs/topics/impala_noncm_installation.xml M docs/topics/impala_prereqs.xml M docs/topics/impala_tutorial.xml 5 files changed, 72 insertions(+), 28 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/6064/1 -- To view, visit http://gerrit.cloudera.org:8080/6064 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I76c9b53f587bc85c5c21e195f0a771183d4ef3a0 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Laurel Hale
[Impala-ASF-CR] IMPALA-2328: Read support for min/max Parquet statistics
Lars Volker has posted comments on this change. Change subject: IMPALA-2328: Read support for min/max Parquet statistics .. Patch Set 4: PS4 adds the "statistics predicates" to the PlannerTest and removes the custom predicate normalization from KuduScanNode.java, after it is now done during query rewrite. -- To view, visit http://gerrit.cloudera.org:8080/6032 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Lars Volker Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2328: Read support for min/max Parquet statistics
Lars Volker has uploaded a new patch set (#4). Change subject: IMPALA-2328: Read support for min/max Parquet statistics .. IMPALA-2328: Read support for min/max Parquet statistics This change adds support for skipping row groups based on Parquet row group statistics. With this change we only support reading statistics from Parquet files for numerical types (bool, integer, floating point) and for simple predicates of the formsor , where is LT, LE, GE, GT, and EQ. Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4 --- M be/src/exec/CMakeLists.txt M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h A be/src/exec/parquet-column-stats.cc M be/src/exec/parquet-column-stats.h M be/src/exec/parquet-metadata-utils.cc M be/src/exec/parquet-metadata-utils.h M be/src/exprs/expr.h M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java A fe/src/main/java/org/apache/impala/rewrite/NormalizeBinaryPredicatesRule.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test M testdata/workloads/functional-planner/queries/PlannerTest/data-source-tables.test M testdata/workloads/functional-planner/queries/PlannerTest/empty.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test A testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test M tests/query_test/test_insert_parquet.py 27 files changed, 729 insertions(+), 57 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/6032/4 -- To view, visit http://gerrit.cloudera.org:8080/6032 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-4821: Update AVG() for DECIMAL V2
Alex Behm has posted comments on this change. Change subject: IMPALA-4821: Update AVG() for DECIMAL_V2 .. Patch Set 3: (3 comments) http://gerrit.cloudera.org:8080/#/c/6038/3//COMMIT_MSG Commit Message: Line 12: 2. precision will be augmented if the scale is adjusted. In addition to my other comment, I think we should mention some of the limitations here, for example, that it won't be possible to compute the AVG() of some decimal types at all. http://gerrit.cloudera.org:8080/#/c/6038/3/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java: Line 337: // an implicit divide. This is similar to the MS SQL server's behavior which takes Suggest these minor fixes and reordering: AVG() gets at least MIN_ADJUSTED_SCALE decimal places since it performs an implicit divide. However, the output type is not always the same as SUM()/COUNT(). Our behavior is similar to MS SQL Server's which takes the max of the input's scale and MIN_ADJUSTED_SCALE. The observations are ok, but at a high level: * the type is not the same as SUM()/COUNT() * the behavior is not the same as MS SQL Server's It would be good to provide a justification/intuition for why we chose yet another behavior (also in the commit msg). http://gerrit.cloudera.org:8080/#/c/6038/3/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test File testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test: Line 208: We should also add tests for those decimal types where AVG() cannot be computed (because we always overflow). -- To view, visit http://gerrit.cloudera.org:8080/6038 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael HoGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4828: Alter Kudu schema outside Impala may crash on read
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-4828: Alter Kudu schema outside Impala may crash on read .. IMPALA-4828: Alter Kudu schema outside Impala may crash on read Creating a table in Impala, changing the column schema outside of Impala, and then reading again in Impala may result in a crash. Impala may attempt to dereference pointers that aren't there. This happens if a string column is dropped and then a new, non string column is added with the old string column's name. The Kudu scan token contains the projection schema, and that is validated when opening the Kudu scanner (with the exception of KUDU-1881), but the issue is that during planning, Impala assumes the types/nullability of columns haven't changed when creating the scan tokens. This is fixed by adding a check when creating the scan token, and failing the query if the column types changed. Impala then relies on the Kudu client to properly validate that the underlying schema is still represented by the scan token, and that deserialization will fail if it no longer matches. Test cases were added for this particular crash scenario, which now fails during planning as expected. This does not attempt to validate the Kudu client validation at deserialization time, though that would be valuable coverage to add in the future. Columns being removed don't produce a crash; the query fails gracefully. A test was added for this case. Columns being added should not affect this scenario, but a test was added anyway. Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a Reviewed-on: http://gerrit.cloudera.org:8080/5840 Reviewed-by: Matthew JacobsTested-by: Impala Public Jenkins --- M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M tests/common/kudu_test_suite.py M tests/query_test/test_kudu.py 3 files changed, 227 insertions(+), 6 deletions(-) Approvals: Impala Public Jenkins: Verified Matthew Jacobs: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/5840 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-4828: Alter Kudu schema outside Impala may crash on read
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4828: Alter Kudu schema outside Impala may crash on read .. Patch Set 9: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/5840 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage
Taras Bobrovytsky has uploaded a new patch set (#3). Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage .. IMPALA-4787: Optimize APPX_MEDIAN() memory usage Before this change, ReservoirSample functions (such as APPX_MEDIAN()) allocated memory for 20,000 elements up front per grouping key. This caused inefficient memory usage for aggregations with many grouping keys. This patch fixes this by initially allocating memory for 20 elements. Once the buffer becomes full, we allocate room for 200 elements and copy the original buffer into the new one. We continue increasing the buffer size this way until the buffer has room for 20,000 elements as before. Testing: Ran BE and some relevant EE tests locally. No new tests were added, the existing tests should provide enough coverage. Memory benchmark: The following query was used to verify that this patch reduces memory usage: SELECT APPX_MEDIAN(ss_sold_date_sk) FROM tpcds.store_sales GROUP BY ss_customer_sk; Peak Mem in Agg nodes is reduced from 4.94 GB to 20.67 MB. Summary before: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail 04:EXCHANGE 15.856ms5.856ms 14.82K 15.21K 0 -1.00 B UNPARTITIONED 03:AGGREGATE33s721ms3s789ms 14.82K 15.21K 4.94 GB 10.00 MB FINALIZE 02:EXCHANGE 3 139.276ms 157.753ms 15.60K 15.21K 0 0 HASH(ss_customer_sk) 01:AGGREGATE32s851ms3s026ms 15.60K 15.21K 5.29 GB 10.00 MB STREAMING 00:SCAN HDFS3 24.245ms 35.727ms 183.59K 183.59K 4.60 MB 384.00 MB tpcds.store_sales Summary after: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail --- 04:EXCHANGE 11.869ms 1.869ms 14.82K 15.21K 0 -1.00 B UNPARTITIONED 03:AGGREGATE3 34.225ms 37.978ms 14.82K 15.21K 20.67 MB 10.00 MB FINALIZE 02:EXCHANGE 3 860.854us 1.024ms 15.60K 15.21K 0 0 HASH(ss_customer_sk) 01:AGGREGATE3 74.048ms 93.605ms 15.60K 15.21K 9.99 MB 10.00 MB STREAMING 00:SCAN HDFS3 68.162ms 78.181ms 183.59K 183.59K 3.84 MB 384.00 MB tpcds.store_sales Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 --- M be/src/exprs/aggregate-functions-ir.cc M testdata/workloads/functional-query/queries/QueryTest/alloc-fail-init.test 2 files changed, 141 insertions(+), 33 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/6025/3 -- To view, visit http://gerrit.cloudera.org:8080/6025 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/6025/2/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: PS2, Line 1036: state->max_num_samples = new_size; > The previous state gets copied in ReallocBuffer. The only thing we need to Ah, didn't remember the details of reallocate. Comment will be helpful, thanks. PS2, Line 1167: (dst->num_samples < MAX_NUM_SAMPLES > If DST does not have enough capacity for MAX_NUM_SAMPLES, it will be resize I see, I missed that call before. Thanks. -- To view, visit http://gerrit.cloudera.org:8080/6025 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage
Taras Bobrovytsky has uploaded a new patch set (#3). Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage .. IMPALA-4787: Optimize APPX_MEDIAN() memory usage Before this change, ReservoirSample functions (such as APPX_MEDIAN()) allocated memory for 20,000 elements up front per grouping key. This caused inefficient memory usage for aggregations with many grouping keys. This patch fixes this by initially allocating memory for 20 elements. Once the buffer becomes full, we allocate room for 200 elements and copy the original buffer into the new one. We continue increasing the buffer size this way until the buffer has room for 20,000 elements as before. Testing: Ran BE and some relevant EE tests locally. No new tests were added, the existing tests should provide enough coverage. Memory benchmark: The following query was used to verify that this patch reduces memory usage: SELECT APPX_MEDIAN(ss_sold_date_sk) FROM tpcds.store_sales GROUP BY ss_customer_sk; Peak Mem in Agg nodes is reduced from 4.94 GB to 20.67 MB. Summary before: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail 04:EXCHANGE 15.856ms5.856ms 14.82K 15.21K 0 -1.00 B UNPARTITIONED 03:AGGREGATE33s721ms3s789ms 14.82K 15.21K 4.94 GB 10.00 MB FINALIZE 02:EXCHANGE 3 139.276ms 157.753ms 15.60K 15.21K 0 0 HASH(ss_customer_sk) 01:AGGREGATE32s851ms3s026ms 15.60K 15.21K 5.29 GB 10.00 MB STREAMING 00:SCAN HDFS3 24.245ms 35.727ms 183.59K 183.59K 4.60 MB 384.00 MB tpcds.store_sales Summary after: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail --- 04:EXCHANGE 11.869ms 1.869ms 14.82K 15.21K 0 -1.00 B UNPARTITIONED 03:AGGREGATE3 34.225ms 37.978ms 14.82K 15.21K 20.67 MB 10.00 MB FINALIZE 02:EXCHANGE 3 860.854us 1.024ms 15.60K 15.21K 0 0 HASH(ss_customer_sk) 01:AGGREGATE3 74.048ms 93.605ms 15.60K 15.21K 9.99 MB 10.00 MB STREAMING 00:SCAN HDFS3 68.162ms 78.181ms 183.59K 183.59K 3.84 MB 384.00 MB tpcds.store_sales Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 --- M be/src/exprs/aggregate-functions-ir.cc M testdata/workloads/functional-query/queries/QueryTest/alloc-fail-init.test 2 files changed, 141 insertions(+), 33 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/6025/3 -- To view, visit http://gerrit.cloudera.org:8080/6025 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage
Taras Bobrovytsky has uploaded a new patch set (#3). Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage .. IMPALA-4787: Optimize APPX_MEDIAN() memory usage Before this change, ReservoirSample functions (such as APPX_MEDIAN()) allocated memory for 20,000 elements up front per grouping key. This caused inefficient memory usage for aggregations with many grouping keys. This patch fixes this by initially allocating memory for 20 elements. Once the buffer becomes full, we allocate room for 200 elements and copy the original buffer into the new one. We continue increasing the buffer size this way until the buffer has room for 20,000 elements as before. Testing: Ran BE and some relevant EE tests locally. No new tests were added, the existing tests should provide enough coverage. Memory benchmark: The following query was used to verify that this patch reduces memory usage: SELECT APPX_MEDIAN(ss_sold_date_sk) FROM tpcds.store_sales GROUP BY ss_customer_sk; Peak Mem in Agg nodes is reduced from 4.94 GB to 20.67 MB. Summary before: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail 04:EXCHANGE 15.856ms5.856ms 14.82K 15.21K 0 -1.00 B UNPARTITIONED 03:AGGREGATE33s721ms3s789ms 14.82K 15.21K 4.94 GB 10.00 MB FINALIZE 02:EXCHANGE 3 139.276ms 157.753ms 15.60K 15.21K 0 0 HASH(ss_customer_sk) 01:AGGREGATE32s851ms3s026ms 15.60K 15.21K 5.29 GB 10.00 MB STREAMING 00:SCAN HDFS3 24.245ms 35.727ms 183.59K 183.59K 4.60 MB 384.00 MB tpcds.store_sales Summary after: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail --- 04:EXCHANGE 11.869ms 1.869ms 14.82K 15.21K 0 -1.00 B UNPARTITIONED 03:AGGREGATE3 34.225ms 37.978ms 14.82K 15.21K 20.67 MB 10.00 MB FINALIZE 02:EXCHANGE 3 860.854us 1.024ms 15.60K 15.21K 0 0 HASH(ss_customer_sk) 01:AGGREGATE3 74.048ms 93.605ms 15.60K 15.21K 9.99 MB 10.00 MB STREAMING 00:SCAN HDFS3 68.162ms 78.181ms 183.59K 183.59K 3.84 MB 384.00 MB tpcds.store_sales Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 --- M be/src/exprs/aggregate-functions-ir.cc M testdata/workloads/functional-query/queries/QueryTest/alloc-fail-init.test 2 files changed, 141 insertions(+), 33 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/6025/3 -- To view, visit http://gerrit.cloudera.org:8080/6025 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage .. Patch Set 2: (17 comments) Marcel, I will run the targeted perf query later today. http://gerrit.cloudera.org:8080/#/c/6025/2/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: PS2, Line 963: max_num_samples > confusing to have MAX_NUM_SAMPLES as well. maybe this should be capacity Done PS2, Line 973: { return begin() + idx; } > it'd be good to add bounds checks w/ dchecks Done PS2, Line 984: max_num_samples > capacity Done Line 996: bool IsFull() const { > dcheck that num_samples <= capacity, i.e. that it is NOT over Done PS2, Line 990: size_t byte_size() const { : return byte_size(max_num_samples); : } : : // Returns true if the array of ReservoirSamples that follows this struct in memory is : // full, and no more elements can be pushed back without resizing. : bool IsFull() const { : return num_samples == max_num_samples; : } > i think both of these can be one line One lined one of them, added a DCHECK to the other. PS2, Line 1001: // make this const > ? Done PS2, Line 1018: resizeReservoirSampleState > nit: inconsistent casing in fn names Make this one start with a capital letter. But other ones like begin() should probably start with a lowercase. PS2, Line 1021: N_ > NUM Done PS2, Line 1024: new_size > new_capacity to avoid conflating with memory sizes Done PS2, Line 1024: new_size > Queries with a large number buckets will run out of memory much quicker tha Done PS2, Line 1024: int new_size = state->max_num_samples * 10; : DCHECK_EQ(state->max_num_samples, state->num_samples); : DCHECK_LE(new_size, MAX_NUM_SAMPLES); > Looks like it'll dcheck when capacity reaches MAX_NUM_SAMPLES. Shouldn't th In this implementation it couldn't DCHECK, because we would start with capacity 20, then go to 200, 2000, and finally 2. This is now changed because Mostafa suggested doubling instead. PS2, Line 1036: state->max_num_samples = new_size; > why don't any other fields need to be initialized? e.g. the rng will be gar The previous state gets copied in ReallocBuffer. The only thing we need to change is is the capacity. Added comment. PS2, Line 1048: size > again I'd vote for capacity I got rid of this variable. PS2, Line 1058: *state = ReservoirSampleState(); > this looks like it'll initialize the memory properly. I think we'd probably We don't need to do that on line 1035 because everything gets copied over in ReallocBuffer() PS2, Line 1084: ++state->source_size; > I think this gets lost when you allocate a new State. As mentioned above, everything is copied in realloc PS2, Line 1152: ReservoirSampleMerge > I think this algorithm will require some changes to handle varying length S The capacity shouldn't matter in this algorithm. This algorithm only cares about num samples in src and dst states. PS2, Line 1167: (dst->num_samples < MAX_NUM_SAMPLES > I don't think you can rely on this anymore, dst may not have capacity MAX_N If DST does not have enough capacity for MAX_NUM_SAMPLES, it will be resized on line 1170 -- To view, visit http://gerrit.cloudera.org:8080/6025 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR](asf-site) publishing Impala 2.8 rendered docs
Laurel Hale has uploaded a new change for review. http://gerrit.cloudera.org:8080/6062 Change subject: publishing Impala 2.8 rendered docs .. publishing Impala 2.8 rendered docs Change-Id: Ied1bbe1650e54a045fbf48b3f91867fd15a1525b --- A docs/build/html/commonltr.css A docs/build/html/commonrtl.css A docs/build/html/images/howto_access_control.png A docs/build/html/images/howto_per_node_peak_memory_usage.png A docs/build/html/images/howto_show_histogram.png A docs/build/html/images/howto_static_server_pools_config.png A docs/build/html/images/impala_arch.jpeg A docs/build/html/images/support_send_diagnostic_data.png A docs/build/html/index.html A docs/build/html/shared/ImpalaVariables.html A docs/build/html/shared/impala_common.html A docs/build/html/topics/impala.html A docs/build/html/topics/impala_abort_on_default_limit_exceeded.html A docs/build/html/topics/impala_abort_on_error.html A docs/build/html/topics/impala_admin.html A docs/build/html/topics/impala_admission.html A docs/build/html/topics/impala_aggregate_functions.html A docs/build/html/topics/impala_aliases.html A docs/build/html/topics/impala_allow_unsupported_formats.html A docs/build/html/topics/impala_alter_table.html A docs/build/html/topics/impala_alter_view.html A docs/build/html/topics/impala_analytic_functions.html A docs/build/html/topics/impala_appx_count_distinct.html A docs/build/html/topics/impala_appx_median.html A docs/build/html/topics/impala_array.html A docs/build/html/topics/impala_auditing.html A docs/build/html/topics/impala_authentication.html A docs/build/html/topics/impala_authorization.html A docs/build/html/topics/impala_avg.html A docs/build/html/topics/impala_avro.html A docs/build/html/topics/impala_batch_size.html A docs/build/html/topics/impala_bigint.html A docs/build/html/topics/impala_bit_functions.html A docs/build/html/topics/impala_boolean.html A docs/build/html/topics/impala_breakpad.html A docs/build/html/topics/impala_char.html A docs/build/html/topics/impala_cluster_sizing.html A docs/build/html/topics/impala_comments.html A docs/build/html/topics/impala_complex_types.html A docs/build/html/topics/impala_components.html A docs/build/html/topics/impala_compression_codec.html A docs/build/html/topics/impala_compute_stats.html A docs/build/html/topics/impala_concepts.html A docs/build/html/topics/impala_conditional_functions.html A docs/build/html/topics/impala_config.html A docs/build/html/topics/impala_config_options.html A docs/build/html/topics/impala_config_performance.html A docs/build/html/topics/impala_connecting.html A docs/build/html/topics/impala_conversion_functions.html A docs/build/html/topics/impala_count.html A docs/build/html/topics/impala_create_database.html A docs/build/html/topics/impala_create_function.html A docs/build/html/topics/impala_create_role.html A docs/build/html/topics/impala_create_table.html A docs/build/html/topics/impala_create_view.html A docs/build/html/topics/impala_databases.html A docs/build/html/topics/impala_datatypes.html A docs/build/html/topics/impala_datetime_functions.html A docs/build/html/topics/impala_ddl.html A docs/build/html/topics/impala_debug_action.html A docs/build/html/topics/impala_decimal.html A docs/build/html/topics/impala_default_order_by_limit.html A docs/build/html/topics/impala_delegation.html A docs/build/html/topics/impala_delete.html A docs/build/html/topics/impala_describe.html A docs/build/html/topics/impala_development.html A docs/build/html/topics/impala_disable_codegen.html A docs/build/html/topics/impala_disable_row_runtime_filtering.html A docs/build/html/topics/impala_disable_streaming_preaggregations.html A docs/build/html/topics/impala_disable_unsafe_spills.html A docs/build/html/topics/impala_disk_space.html A docs/build/html/topics/impala_distinct.html A docs/build/html/topics/impala_dml.html A docs/build/html/topics/impala_double.html A docs/build/html/topics/impala_drop_database.html A docs/build/html/topics/impala_drop_function.html A docs/build/html/topics/impala_drop_role.html A docs/build/html/topics/impala_drop_stats.html A docs/build/html/topics/impala_drop_table.html A docs/build/html/topics/impala_drop_view.html A docs/build/html/topics/impala_exec_single_node_rows_threshold.html A docs/build/html/topics/impala_explain.html A docs/build/html/topics/impala_explain_level.html A docs/build/html/topics/impala_explain_plan.html A docs/build/html/topics/impala_faq.html A docs/build/html/topics/impala_file_formats.html A docs/build/html/topics/impala_fixed_issues.html A docs/build/html/topics/impala_float.html A docs/build/html/topics/impala_functions.html A docs/build/html/topics/impala_functions_overview.html A docs/build/html/topics/impala_grant.html A docs/build/html/topics/impala_group_by.html A docs/build/html/topics/impala_group_concat.html A docs/build/html/topics/impala_hadoop.html A docs/build/html/topics/impala_having.html A
[Impala-ASF-CR] IMPALA-4546: Fix Moscow timezone conversion after 2014
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-4546: Fix Moscow timezone conversion after 2014 .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/5969/2/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: Line 3720: TestStringValue(UTC_TO_MSC("2014-10-25 22:00:00"), "2014-10-26 01:00:00"); > What do you think should be done about this? After speaking to Greg, we decided that we should return NULL for that 1 hour to be consistent. -- To view, visit http://gerrit.cloudera.org:8080/5969 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id6e3f2c9f6ba29749a26bc1087e664637bc02528 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4885: Expose Jvm thread info in web UI
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4885: Expose Jvm thread info in web UI .. Patch Set 1: (10 comments) Thanks for the suggestions Henry. New screenshots are here: https://drive.google.com/file/d/0B7VW0hRyzVlKTzZPR3VEVlVEYWc/view?usp=sharing https://drive.google.com/file/d/0B7VW0hRyzVlKOXowTUM2QWVySm8/view?usp=sharing http://gerrit.cloudera.org:8080/#/c/6013/1/be/src/util/thread.cc File be/src/util/thread.cc: Line 178: void ThreadOverviewUrlCallback(bool track_jvm_threads, > Shouldn't this go above ThreadGroupUrlCallback() to match with the sample j No, this is the output of the overview callback. PS1, Line 194: [this, track_jvm_threads] (const Webserver::ArgumentMap& args, Document* doc) { : this->ThreadOverviewUrlCallback(track_jvm_threads, args, doc) > Make this a method? MakeUrlCallBack or something, used multiple times. This has changed. PS1, Line 209: RegisterUrlCallback > Rather than adding another top-level link, consider making two tabs on the Done PS1, Line 234: bool track_jvm_threads, > how about just making this a member of ThreadMgr? Actually - I don't think Done Line 260: << status.GetDetail(); > return, then remove else { } Done PS1, Line 262: jvmThreadsVal > C++ naming styles power of habit.. :) Done PS1, Line 327: INFO > warn/error? How about bubbling this error up to the web endpoint by setting Changed it to WARN. What is the "error" member? http://gerrit.cloudera.org:8080/#/c/6013/1/be/src/util/thread.h File be/src/util/thread.h: Line 191: /// the "thread-manager." > Describe track_jvm_threads in the method comment? Done http://gerrit.cloudera.org:8080/#/c/6013/1/common/thrift/Frontend.thrift File common/thrift/Frontend.thrift: Line 720: // Summary of a JVM thread. Includes stacktraces, locked monitors > Do you think its good to include the locked synchronizers or the lock it is It already does that. http://gerrit.cloudera.org:8080/#/c/6013/1/www/jvm-threadz.tmpl File www/jvm-threadz.tmpl: PS1, Line 35: Is n > Native Done -- To view, visit http://gerrit.cloudera.org:8080/6013 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id497043ab33dcf107a562f0b1ccd5c46095d397f Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4904,IMPALA-4914: add targeted-stress to exhaustive tests
Hello David Knupp, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6002 to look at the new patch set (#5). Change subject: IMPALA-4904,IMPALA-4914: add targeted-stress to exhaustive tests .. IMPALA-4904,IMPALA-4914: add targeted-stress to exhaustive tests This patch allows any test suites whose workload is "targeted-stress" to be run in so-called "exhaustive" mode. Before this patch, only suites whose workload was "functional-query" would be run exhaustively. More on this flaw is in IMPALA-3947. The net effects are: 1. We fix IMPALA-4904, which allows test_ddl_stress to start running again. 2. We also improve the situation in IMPALA-4914 by getting TestSpillStress to run, though we don't fix its not-running-concurrently problem. The other mini-cluster stress tests were disabled in this commit: IMPALA-2605: Omit the sort and mini stress tests so they are not directly affected here. I also tried to clarify what "exhaustive" means in some of our shell scripts, via help text and comments. An exhaustive build+test run showed test_ddl_stress and TestSpillStress now get run and passed. This adds roughly 12 minutes to a build that's on the order of 13-14 hours. Change-Id: Ie6bd5bbd380e636d680368e908519b042d79dfec --- M bin/run-all-tests.sh M buildall.sh M tests/custom_cluster/test_spilling.py M tests/stress/test_ddl_stress.py 4 files changed, 67 insertions(+), 33 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/6002/5 -- To view, visit http://gerrit.cloudera.org:8080/6002 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie6bd5bbd380e636d680368e908519b042d79dfec Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael BrownGerrit-Reviewer: David Knupp Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR](asf-site) publishing Impala 2.8 rendered docs
Laurel Hale has abandoned this change. Change subject: publishing Impala 2.8 rendered docs .. Abandoned practicing with Jim to publish on Impala ASF site -- To view, visit http://gerrit.cloudera.org:8080/6062 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: abandon Gerrit-Change-Id: Ied1bbe1650e54a045fbf48b3f91867fd15a1525b Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: asf-site Gerrit-Owner: Laurel HaleGerrit-Reviewer: Jim Apple
[Impala-ASF-CR] IMPALA-4934: Disable Kudu OpenSSL initialization
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4934: Disable Kudu OpenSSL initialization .. Patch Set 4: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/284/ -- To view, visit http://gerrit.cloudera.org:8080/6056 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3f13f3af512c6d771979638da593685524c73086 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4904,IMPALA-4914: add targeted-stress to exhaustive tests
Michael Brown has posted comments on this change. Change subject: IMPALA-4904,IMPALA-4914: add targeted-stress to exhaustive tests .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/6002/3//COMMIT_MSG Commit Message: Line 7: IMPALA-4904,IMPALA-4914: whitelist workload "targeted-stress" for exhaustive tests > Thanks for the point Henry. I once saw on an Impala code review (I can't fi Done http://gerrit.cloudera.org:8080/#/c/6002/3/tests/stress/test_ddl_stress.py File tests/stress/test_ddl_stress.py: Line 59 > Given the number of catalog race conditions we tend to see, I think it is p TEST_IDS (LHS L45) is equivalent to RHS L52. The RHS is simply the standard py.test way of going it. The real difference here is the introduction of unique_database, which I agree removes the potential of "CREATE DATABASE IF NOT EXISTS ddl_stress_testdb" to be run concurrently. From a test perspective, partially reverting this to the previous behavior should not cause the test to fail. Let me address that in a separate patch set. -- To view, visit http://gerrit.cloudera.org:8080/6002 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie6bd5bbd380e636d680368e908519b042d79dfec Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael BrownGerrit-Reviewer: David Knupp Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR](asf-site) publishing Impala 2.8 rendered docs
Jim Apple has posted comments on this change. Change subject: publishing Impala 2.8 rendered docs .. Patch Set 1: Code-Review-2 Verified-1 Practicing with Laurel -- To view, visit http://gerrit.cloudera.org:8080/6062 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ied1bbe1650e54a045fbf48b3f91867fd15a1525b Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: asf-site Gerrit-Owner: Laurel HaleGerrit-Reviewer: Jim Apple Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts
Zach Amsden has uploaded a new patch set (#13). Change subject: IMPALA-2020: Add rounding for decimal casts .. IMPALA-2020: Add rounding for decimal casts This change adds support for DECIMAL_V2 rounding behavior for both DECIMAL to INT and DOUBLE to DECIMAL casts. The round behavior implemented for exact halves is round halves away from zero (e.g (0.5 -> 1) and (-0.5 -> -1)). Testing: Added expr-test and decimal-test test coverage as well as manual testing. I tried to update the expr benchmark to get some kind of results but the benchmark is pretty bit-rotted. It was throwing JNI exceptions. Fixed up the JNI init call, but there is still a lot of work to do to get this back in a runnable state. Even with the hack to get at the RuntimeContext, we end up getting null derefs due to the slot descriptor table not being initialized. [localhost:21000] > select cast(0.5 AS int); +--+ | cast(0.5 as int) | +--+ | 0| +--+ Fetched 1 row(s) in 0.01s [localhost:21000] > select cast(cast(0.5999 as float) as decimal(5,1)); +-+ | cast(cast(0.5999 as float) as decimal(5,1)) | +-+ | 0.5 | +-+ Fetched 1 row(s) in 0.01s [localhost:21000] > set decimal_v2=1; DECIMAL_V2 set to 1 [localhost:21000] > select cast(0.5 AS int); +--+ | cast(0.5 as int) | +--+ | 1| +--+ Fetched 1 row(s) in 0.01s [localhost:21000] > select cast(cast(0.5999 as float) as decimal(5,1)); +-+ | cast(cast(0.5999 as float) as decimal(5,1)) | +-+ | 0.6 | +-+ Fetched 1 row(s) in 0.01s Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810 --- M be/src/benchmarks/expr-benchmark.cc M be/src/exprs/decimal-operators-ir.cc M be/src/exprs/expr-test.cc M be/src/exprs/expr.h M be/src/exprs/literal.cc M be/src/runtime/decimal-test.cc M be/src/runtime/decimal-value.h M be/src/runtime/decimal-value.inline.h M be/src/udf/udf.h 9 files changed, 467 insertions(+), 111 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/5951/13 -- To view, visit http://gerrit.cloudera.org:8080/5951 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810 Gerrit-PatchSet: 13 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-4885: Expose Jvm thread info in web UI
Dimitris Tsirogiannis has uploaded a new patch set (#2). Change subject: IMPALA-4885: Expose Jvm thread info in web UI .. IMPALA-4885: Expose Jvm thread info in web UI This commit exposes information about JVM threads to the impalad and catalogd web UIs. This information includes statistics about the number of threads running in the JVM as well as per-thread stacktraces, monitors and synchronizers. Total CPU, user CPU and blocked time is also reported per thread. Change-Id: Id497043ab33dcf107a562f0b1ccd5c46095d397f --- M be/src/catalog/catalogd-main.cc M be/src/service/impalad-main.cc M be/src/statestore/statestored-main.cc M be/src/util/jni-util.cc M be/src/util/jni-util.h M be/src/util/thread.cc M be/src/util/thread.h M common/thrift/Frontend.thrift M fe/src/main/java/org/apache/impala/common/JniUtil.java A www/jvm-threadz.tmpl M www/threadz.tmpl A www/threadz_tabs.tmpl 12 files changed, 401 insertions(+), 77 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/6013/2 -- To view, visit http://gerrit.cloudera.org:8080/6013 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id497043ab33dcf107a562f0b1ccd5c46095d397f Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson
[Impala-ASF-CR] IMPALA-4904,IMPALA-4914: whitelist workload "targeted-stress" for exhaustive tests
Michael Brown has posted comments on this change. Change subject: IMPALA-4904,IMPALA-4914: whitelist workload "targeted-stress" for exhaustive tests .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/6002/3//COMMIT_MSG Commit Message: Line 7: IMPALA-4904,IMPALA-4914: whitelist workload "targeted-stress" for exhaustive tests > Although I'm not keen on the strict 72-char limit (we have many other commi Thanks for the point Henry. I once saw on an Impala code review (I can't find it to cite it) a fair refute of this idea: writing Git commit headers like "IMPALA-{4904, 4914}" was a bad idea, because if someone needs to search Git logs for IMPALA-4914, he won't match this particular header. -- To view, visit http://gerrit.cloudera.org:8080/6002 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie6bd5bbd380e636d680368e908519b042d79dfec Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael BrownGerrit-Reviewer: David Knupp Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4904,IMPALA-4914: whitelist workload "targeted-stress" for exhaustive tests
Henry Robinson has posted comments on this change. Change subject: IMPALA-4904,IMPALA-4914: whitelist workload "targeted-stress" for exhaustive tests .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/6002/3//COMMIT_MSG Commit Message: Line 7: IMPALA-4904,IMPALA-4914: whitelist workload "targeted-stress" for exhaustive tests > I agree this loses some information, but I think it's the right tradeoff: Although I'm not keen on the strict 72-char limit (we have many other commit messages that go over this - what's the reason for enforcing it?) - note you can save some chars by compressing the list of fixed JIRAs: IMPALA-{4904, 4914}: etc. -- To view, visit http://gerrit.cloudera.org:8080/6002 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie6bd5bbd380e636d680368e908519b042d79dfec Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael BrownGerrit-Reviewer: David Knupp Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4904,IMPALA-4914: whitelist workload "targeted-stress" for exhaustive tests
Jim Apple has posted comments on this change. Change subject: IMPALA-4904,IMPALA-4914: whitelist workload "targeted-stress" for exhaustive tests .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/6002/3//COMMIT_MSG Commit Message: Line 7: IMPALA-4904,IMPALA-4914: whitelist workload "targeted-stress" for exhaustive tests > Can you please suggest a meaningful header that summarizes the patch that a I agree this loses some information, but I think it's the right tradeoff: IMPALA-4904,IMPALA-4914: add targeted-stress to exhaustive tests http://gerrit.cloudera.org:8080/#/c/6002/3/tests/stress/test_ddl_stress.py File tests/stress/test_ddl_stress.py: Line 59 > No change in ordering of DDL calls. The CREATE DATABASE that is executed no Given the number of catalog race conditions we tend to see, I think it is possible that this changes the test meaning. My feeling is that it would be good to test both things: multiple tables in one DB and multiple tables each with its own DB. If this method and TEST_IDS were left unchanged, would the test fail? -- To view, visit http://gerrit.cloudera.org:8080/6002 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie6bd5bbd380e636d680368e908519b042d79dfec Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael BrownGerrit-Reviewer: David Knupp Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4904,IMPALA-4914: whitelist workload "targeted-stress" for exhaustive tests
Michael Brown has posted comments on this change. Change subject: IMPALA-4904,IMPALA-4914: whitelist workload "targeted-stress" for exhaustive tests .. Patch Set 4: Code-Review+1 (9 comments) Thanks for the review, Jim. Patch set 4 does not have any functional changes relative to patch set 3. Some of your comments I answered by making changes to the comments in code; most I answer as comments in Gerrit. Carry +1 http://gerrit.cloudera.org:8080/#/c/6002/3//COMMIT_MSG Commit Message: Line 7: IMPALA-4904,IMPALA-4914: whitelist workload "targeted-stress" for exhaustive tests > long line Can you please suggest a meaningful header that summarizes the patch that also fits in 72 columns? "Long line" on its own isn't helpful. I realize it's long, but I'm not sure how to shorten it without loss of information. Line 29: An exhaustive build+test run showed test_ddl_stress and TestSpillStress > Roughly how much longer does this make exhaustive runs? Done http://gerrit.cloudera.org:8080/#/c/6002/3/bin/run-all-tests.sh File bin/run-all-tests.sh: PS3, Line 100: XXX > For my education: what do the three X's mean Vim highlights it (like how it highlights "TODO"). I like to use them as notices. PS3, Line 102: ` > I don't think I've seen this idiom before. I recognize it is in HEAD alread That's the gist. I expect it's either a Martinism or Caseyism. I like it: it allows for clearer formatting and organization of multi-line quoted text on RHS of an assignment. Note that parentheses won't work, nor will backslash. http://gerrit.cloudera.org:8080/#/c/6002/3/buildall.sh File buildall.sh: Line 188:"its dependencies. If already running, all services are restarted."\ > What happened to the spaces? I noticed this when running buildall.sh -help They are not needed and ./buidall.sh -help will show two spaces (on master). See this gist for more info, if you care. :) I don't want to clutter the review with the explanation. https://gist.github.com/mikesbrown/ca0c5e4a2ba13dd929f72c57adf4e8bc PS3, Line 196: ONLY APPLIES to suites with workloads:"\ :"functional-query, targeted-stress" > This seems likely to become stale if that changes, as it may not show up to I think it's important to convey it *somewhere* what "running exhaustively" actually means. buildall.sh is the common entry point to our system, and we already have help text here. I simply clarified the help text. I also left a notice (that "XXX" you asked about) in hopes people will be prompted to update it. http://gerrit.cloudera.org:8080/#/c/6002/3/tests/custom_cluster/test_spilling.py File tests/custom_cluster/test_spilling.py: PS3, Line 56: ) and ca > Would it be worthwhile or even possible to fix it to be more typical? If so Done PS3, Line 66: equivalent > This one I don't quite understand. The CustomClusterTestSuite presumably sh Done http://gerrit.cloudera.org:8080/#/c/6002/3/tests/stress/test_ddl_stress.py File tests/stress/test_ddl_stress.py: Line 59 > Does removing this (to pytest.mark.parametrize) change the possible orderin No change in ordering of DDL calls. The CREATE DATABASE that is executed now is part of the CREATE DATABASE from the unique_database fixture before the test proper starts. The difference is each test will use its own unique database instead of sharing across "ddl_stres_testdb". -- To view, visit http://gerrit.cloudera.org:8080/6002 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie6bd5bbd380e636d680368e908519b042d79dfec Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael BrownGerrit-Reviewer: David Knupp Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4904,IMPALA-4914: whitelist workload "targeted-stress" for exhaustive tests
Hello David Knupp, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6002 to look at the new patch set (#4). Change subject: IMPALA-4904,IMPALA-4914: whitelist workload "targeted-stress" for exhaustive tests .. IMPALA-4904,IMPALA-4914: whitelist workload "targeted-stress" for exhaustive tests This patch allows any test suites whose workload is "targeted-stress" to be run in so-called "exhaustive" mode. Before this patch, only suites whose workload was "functional-query" would be run exhaustively. More on this flaw is in IMPALA-3947. The net effects are: 1. We fix IMPALA-4904, which allows test_ddl_stress to start running again. 2. We also improve the situation in IMPALA-4914 by getting TestSpillStress to run, though we don't fix its not-running-concurrently problem. The other mini-cluster stress tests were disabled in this commit: IMPALA-2605: Omit the sort and mini stress tests so they are not directly affected here. I also tried to clarify what "exhaustive" means in some of our shell scripts, via help text and comments. An exhaustive build+test run showed test_ddl_stress and TestSpillStress now get run and passed. This adds roughly 12 minutes to a build that's on the order of 13-14 hours. Change-Id: Ie6bd5bbd380e636d680368e908519b042d79dfec --- M bin/run-all-tests.sh M buildall.sh M tests/custom_cluster/test_spilling.py M tests/stress/test_ddl_stress.py 4 files changed, 67 insertions(+), 33 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/6002/4 -- To view, visit http://gerrit.cloudera.org:8080/6002 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie6bd5bbd380e636d680368e908519b042d79dfec Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael BrownGerrit-Reviewer: David Knupp Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4821: Update AVG() for DECIMAL V2
Michael Ho has posted comments on this change. Change subject: IMPALA-4821: Update AVG() for DECIMAL_V2 .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/6038/2/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test File testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test: PS2, Line 197: 0 > Yes, I tried verifying with sum()/num_rows(). That's why I used tpch_parque I meant I verified with sum(col)/count(*). -- To view, visit http://gerrit.cloudera.org:8080/6038 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael HoGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4821: Update AVG() for DECIMAL V2
Hello Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6038 to look at the new patch set (#3). Change subject: IMPALA-4821: Update AVG() for DECIMAL_V2 .. IMPALA-4821: Update AVG() for DECIMAL_V2 This change implements the DECIMAL_V2's behavior for AVG(). The major differences are: 1. output type's scale has a minimum of 6 2. precision will be augmented if the scale is adjusted. Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/expr-test.cc M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java M testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test M testdata/workloads/functional-query/queries/QueryTest/decimal.test 5 files changed, 231 insertions(+), 76 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/6038/3 -- To view, visit http://gerrit.cloudera.org:8080/6038 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael HoGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho
[Impala-ASF-CR] IMPALA-4821: Update AVG() for DECIMAL V2
Michael Ho has posted comments on this change. Change subject: IMPALA-4821: Update AVG() for DECIMAL_V2 .. Patch Set 2: (4 comments) http://gerrit.cloudera.org:8080/#/c/6038/2/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: Line 447: // The scale of the accumulated sum is set to the scale of the output type. > I think this comment would be better in the frontend code since that's wher Done http://gerrit.cloudera.org:8080/#/c/6038/2/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java: Line 336: // AVG() always get at least MIN_ADJUSTED_SCALE decimal places since it performs > gets Done Line 338: int resultScale = Math.max(ScalarType.MIN_ADJUSTED_SCALE, digitsAfter); > Is the resulting type the same as if we did (SUM(col) / COUNT(col))? In any Done http://gerrit.cloudera.org:8080/#/c/6038/2/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test File testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test: PS2, Line 197: 0 > do these results make sense? seems surprising that this decimal place is al Yes, I tried verifying with sum()/num_rows(). That's why I used tpch_parquet.lineitem to help verify the result. -- To view, visit http://gerrit.cloudera.org:8080/6038 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael HoGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4624: Implement Parquet dictionary filtering
Joe McDonnell has uploaded a new patch set (#8). Change subject: IMPALA-4624: Implement Parquet dictionary filtering .. IMPALA-4624: Implement Parquet dictionary filtering Here is a basic summary of the changes: Frontend looks for conjuncts that operate on a single slot and pass a map from slot id to the conjunct index through thrift to the backend. The conjunct indices are the incides into the normal PlanNode conjuncts list. The conjuncts need to satisfy certain conditions: 1. They are bound on a single slot 2. They are deterministic (no random functions) 3. They evaluate to FALSE on a NULL input. This is because the dictionary does not include NULLs, so any condition that evaluates to TRUE on NULL cannot be evaluated by looking only at the dictionary. The backend converts the indices into ExprContexts. These are cloned in the scanner threads. The dictionary read codepath has been removed from ReadDataPage into its own function, InitDictionary. This has also been turned into its own step in row group initialization. ReadDataPage will not see any dictionary pages unless the parquet file is invalid. For dictionary filtering, we initialize dictionaries only as needed to evaluate the conjuncts. The Parquet scanner evaluates the single slot conjuncts on the dictionary to see if any dictionary entry passes. If no entry passes, the row group is eliminated. If the row group passes the dictionary filtering, then we initialize all remaining dictionaries. Since column chunks can have a mixture of encodings, dictionary filtering uses three tests to determine whether this is purely dictionary encoded: 1. If the encoding_stats is in the parquet file, then use it to determine if there are only dictionary encoded pages (i.e. there are no data pages with an encoding other than PLAIN_DICTIONARY). -OR- 2. If the encoding stats are not present, then look at the encodings. The column is purely dictionary encoded if: a) PLAIN_DICTIONARY is present AND b) Only PLAIN_DICTIONARY, RLE, or BIT_PACKED encodings are listed -OR- 3. If this file was written by an older version of Impala, then we know that dictionary failover happens when the dictionary reaches 40,000 values. Dictionary filtering can proceed as long as the dictionary is smaller than that. parquet-mr writes the encoding list correctly in the current version in our environment (1.5.0). This means that check #2 works on some existing files (potentially most existing parquet-mr files). parquet-mr writes the encoding stats starting in 1.9.0. This is the version where check #1 will start working. Impala's parquet writer now implements both, so either check above will work. Change-Id: I3a7cc3bd0523fbf3c79bd924219e909ef671cfd7 --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-column-readers.h M be/src/util/dict-encoding.h M be/src/util/dict-test.cc M common/thrift/PlanNodes.thrift M common/thrift/parquet.thrift M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test A testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test M tests/query_test/test_scanners.py 19 files changed, 1,056 insertions(+), 161 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/5904/8 -- To view, visit http://gerrit.cloudera.org:8080/5904 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3a7cc3bd0523fbf3c79bd924219e909ef671cfd7 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnellGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Mostafa Mokhtar
[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage .. Patch Set 2: Taras, why don't you run a targeted perf query to measure runtime overhead of 2x vs. 10x resizing. -- To view, visit http://gerrit.cloudera.org:8080/6025 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts
Zach Amsden has posted comments on this change. Change subject: IMPALA-2020: Add rounding for decimal casts .. Patch Set 12: (2 comments) http://gerrit.cloudera.org:8080/#/c/5951/12/be/src/runtime/decimal-value.inline.h File be/src/runtime/decimal-value.inline.h: PS12, Line 49: double > shouldn't this be T? No, T may not have enough precision to hold the value of 10^scale - look at the old code, which also does the multiplication in double. PS12, Line 52: max_value < 0 > why is the max_value < 0 check needed? If we cast to a decimal type whose underlying width is not large enough to hold the precision, this will become true. Make it a DCHECK? This should be impossible, guaranteed by the type system. -- To view, visit http://gerrit.cloudera.org:8080/5951 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810 Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4904,IMPALA-4914: whitelist workload "targeted-stress" for exhaustive tests
Jim Apple has posted comments on this change. Change subject: IMPALA-4904,IMPALA-4914: whitelist workload "targeted-stress" for exhaustive tests .. Patch Set 2: (9 comments) http://gerrit.cloudera.org:8080/#/c/6002/3//COMMIT_MSG Commit Message: Line 7: IMPALA-4904,IMPALA-4914: whitelist workload "targeted-stress" for exhaustive tests long line Line 29: An exhaustive build+test run showed test_ddl_stress and TestSpillStress Roughly how much longer does this make exhaustive runs? http://gerrit.cloudera.org:8080/#/c/6002/3/bin/run-all-tests.sh File bin/run-all-tests.sh: PS3, Line 100: ` For my education: what do the three X's mean PS3, Line 102: I don't think I've seen this idiom before. I recognize it is in HEAD already, but, for the purposes of my own education, is this command substitution where the command contains only whitespace, thereby just acting like "remove these backticks and all the space between them"? http://gerrit.cloudera.org:8080/#/c/6002/3/buildall.sh File buildall.sh: Line 188:"its dependencies. If already running, all services are restarted."\ What happened to the spaces? PS3, Line 196: ONLY APPLIES to suites with workloads:"\ :"functional-query, targeted-stress" This seems likely to become stale if that changes, as it may not show up to the author of a change to run-all-tests.sh. Is there anything to be done about that? http://gerrit.cloudera.org:8080/#/c/6002/3/tests/custom_cluster/test_spilling.py File tests/custom_cluster/test_spilling.py: PS3, Line 56: Would it be worthwhile or even possible to fix it to be more typical? If so, should that be this patch or should it be a new JIRA? PS3, Line 66: This one I don't quite understand. The CustomClusterTestSuite presumably shuts down impalad, but that's OK once this test is done, yes? http://gerrit.cloudera.org:8080/#/c/6002/3/tests/stress/test_ddl_stress.py File tests/stress/test_ddl_stress.py: Line 59 Does removing this (to pytest.mark.parametrize) change the possible orderings of the DDL calls? -- To view, visit http://gerrit.cloudera.org:8080/6002 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie6bd5bbd380e636d680368e908519b042d79dfec Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael BrownGerrit-Reviewer: David Knupp Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4934: Disable Kudu OpenSSL initialization
Henry Robinson has posted comments on this change. Change subject: IMPALA-4934: Disable Kudu OpenSSL initialization .. Patch Set 4: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/6056/3/bin/impala-config.sh File bin/impala-config.sh: PS3, Line 75: IMPALA_TOOLCHAIN_BUILD_ID > I actually don't need to because I built this Kudu with a new jenkins job ( Sounds ok. If you're revving nightly, it might be better just to have a 'latest' symlink (just like a SNAPSHOT version) than to make a commit every night, but whatever is easiest. -- To view, visit http://gerrit.cloudera.org:8080/6056 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3f13f3af512c6d771979638da593685524c73086 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4934: Disable Kudu OpenSSL initialization
Matthew Jacobs has uploaded a new patch set (#4). Change subject: IMPALA-4934: Disable Kudu OpenSSL initialization .. IMPALA-4934: Disable Kudu OpenSSL initialization Bumps the Kudu version to include the change to the client that allows Impala to disable SSL initialization. In authentication.cc, after Impala initializes OpenSSL, Impala then disables Kudu's OpenSSL init. Change-Id: I3f13f3af512c6d771979638da593685524c73086 --- M be/src/rpc/authentication.cc M bin/impala-config.sh 2 files changed, 10 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/6056/4 -- To view, visit http://gerrit.cloudera.org:8080/6056 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3f13f3af512c6d771979638da593685524c73086 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Matthew Jacobs
[Impala-ASF-CR] IMPALA-4934: Disable Kudu OpenSSL initialization
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-4934: Disable Kudu OpenSSL initialization .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/6056/3/be/src/rpc/authentication.cc File be/src/rpc/authentication.cc: PS3, Line 663: impala > remove? Done http://gerrit.cloudera.org:8080/#/c/6056/3/bin/impala-config.sh File bin/impala-config.sh: PS3, Line 75: IMPALA_TOOLCHAIN_BUILD_ID > Don't you need to change this as well? I actually don't need to because I built this Kudu with a new jenkins job (script still in review [1]) which builds Kudu with an existing toolchain, and adds the newer Kudu. I still had to add the gerrit commit to the toolchain buildall.sh (as you saw) so that future toolchain builds build the latest Kudu version. The goal of this is to start revving the Kudu versions more regularly, perhaps at least testing nightly. 1: https://gerrit.cloudera.org/#/c/6014/ -- To view, visit http://gerrit.cloudera.org:8080/6056 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3f13f3af512c6d771979638da593685524c73086 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4934: Disable Kudu OpenSSL initialization
Henry Robinson has posted comments on this change. Change subject: IMPALA-4934: Disable Kudu OpenSSL initialization .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/6056/3/be/src/rpc/authentication.cc File be/src/rpc/authentication.cc: PS3, Line 663: impala remove? http://gerrit.cloudera.org:8080/#/c/6056/3/bin/impala-config.sh File bin/impala-config.sh: PS3, Line 75: IMPALA_TOOLCHAIN_BUILD_ID Don't you need to change this as well? -- To view, visit http://gerrit.cloudera.org:8080/6056 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3f13f3af512c6d771979638da593685524c73086 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Henry Robinson Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4926: Upgrade LZ4 to 1.7.5
Jim Apple has posted comments on this change. Change subject: IMPALA-4926: Upgrade LZ4 to 1.7.5 .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6030 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I10e4561d0e940fa15ca8248c8277acfc6dff3da3 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry RobinsonGerrit-Reviewer: Jim Apple Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3401 [DOCS] Phase 1 of removing 'Cloudera Manager' Rewrote sections to eliminate 'Cloudera Manager' from topics. Look for subsequent phases to remove remaining instances of CM.
Jim Apple has posted comments on this change. Change subject: IMPALA-3401 [DOCS] Phase 1 of removing 'Cloudera Manager' Rewrote sections to eliminate 'Cloudera Manager' from topics. Look for subsequent phases to remove remaining instances of CM. .. Patch Set 1: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/6049/1//COMMIT_MSG Commit Message: Line 7: IMPALA-3401 [DOCS] Phase 1 of removing 'Cloudera Manager' Rewrote sections to eliminate 'Cloudera Manager' from topics. Look for subsequent phases to remove remaining instances of CM. Commit messages should wrap at 72 characters -- To view, visit http://gerrit.cloudera.org:8080/6049 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I02ff6c3fc74e2e59b5d130226bd38c23c9c094b7 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Laurel HaleGerrit-Reviewer: Ambreen Kazi Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: John Russell Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4828: Alter Kudu schema outside Impala may crash on read
Hello Thomas Tauber-Marshall, Sailesh Mukil, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5840 to look at the new patch set (#9). Change subject: IMPALA-4828: Alter Kudu schema outside Impala may crash on read .. IMPALA-4828: Alter Kudu schema outside Impala may crash on read Creating a table in Impala, changing the column schema outside of Impala, and then reading again in Impala may result in a crash. Impala may attempt to dereference pointers that aren't there. This happens if a string column is dropped and then a new, non string column is added with the old string column's name. The Kudu scan token contains the projection schema, and that is validated when opening the Kudu scanner (with the exception of KUDU-1881), but the issue is that during planning, Impala assumes the types/nullability of columns haven't changed when creating the scan tokens. This is fixed by adding a check when creating the scan token, and failing the query if the column types changed. Impala then relies on the Kudu client to properly validate that the underlying schema is still represented by the scan token, and that deserialization will fail if it no longer matches. Test cases were added for this particular crash scenario, which now fails during planning as expected. This does not attempt to validate the Kudu client validation at deserialization time, though that would be valuable coverage to add in the future. Columns being removed don't produce a crash; the query fails gracefully. A test was added for this case. Columns being added should not affect this scenario, but a test was added anyway. Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a --- M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M tests/common/kudu_test_suite.py M tests/query_test/test_kudu.py 3 files changed, 227 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/5840/9 -- To view, visit http://gerrit.cloudera.org:8080/5840 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-4828: Alter Kudu schema outside Impala may crash on read
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-4828: Alter Kudu schema outside Impala may crash on read .. Patch Set 9: Code-Review+2 fixed spelling and carry +2 -- To view, visit http://gerrit.cloudera.org:8080/5840 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No