[Impala-ASF-CR] IMPALA-4282: Remove max length check for type strings.
Bharath Vissapragada has posted comments on this change. Change subject: IMPALA-4282: Remove max length check for type strings. .. Patch Set 2: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/6034 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I01f5e503e14feee857bbdf80b3ef4f5a8b57fb2d Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4282: Remove max length check for type strings.
Alex Behm has uploaded a new patch set (#2). Change subject: IMPALA-4282: Remove max length check for type strings. .. IMPALA-4282: Remove max length check for type strings. During analysis, Impala used to enforce the default HMS limit on the number of characters of a type string. That enforcement has the benefit of a clear error message, but has the severe drawback of not having any workaround because the limit is baked into the code. Going above the 4000 limit is pretty easy with complex types, and several users have already run into this issue. This patch removes the Impala-side enforcement such that the 4000 limit can be increased by altering the corresponding columns in the Metastore's backend database. Change-Id: I01f5e503e14feee857bbdf80b3ef4f5a8b57fb2d --- M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java 3 files changed, 1 insertion(+), 50 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/34/6034/2 -- To view, visit http://gerrit.cloudera.org:8080/6034 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I01f5e503e14feee857bbdf80b3ef4f5a8b57fb2d Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Bharath Vissapragada
[Impala-ASF-CR] IMPALA-4282: Remove max length check for type strings.
Alex Behm has posted comments on this change. Change subject: IMPALA-4282: Remove max length check for type strings. .. Patch Set 2: Ahh right. Thanks. Done. -- To view, visit http://gerrit.cloudera.org:8080/6034 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I01f5e503e14feee857bbdf80b3ef4f5a8b57fb2d Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4282: Remove max length check for type strings.
Bharath Vissapragada has posted comments on this change. Change subject: IMPALA-4282: Remove max length check for type strings. .. Patch Set 1: Looks like you forgot to git-add changes from ColumnDef.java. -- To view, visit http://gerrit.cloudera.org:8080/6034 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I01f5e503e14feee857bbdf80b3ef4f5a8b57fb2d Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Bharath Vissapragada Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4282: Remove max length check for type strings.
Alex Behm has uploaded a new change for review. http://gerrit.cloudera.org:8080/6034 Change subject: IMPALA-4282: Remove max length check for type strings. .. IMPALA-4282: Remove max length check for type strings. During analysis, Impala used to enforce the default HMS limit on the number of characters of a type string. That enforcement has the benefit of a clear error message, but has the severe drawback of not having any workaround because the limit is baked into the code. Going above the 4000 limit is pretty easy with complex types, and several users have already run into this issue. This patch removes the Impala-side enforcement such that the 4000 limit can be increased by altering the corresponding column in the Metastore's backend database. Change-Id: I01f5e503e14feee857bbdf80b3ef4f5a8b57fb2d --- M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java 2 files changed, 0 insertions(+), 42 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/34/6034/1 -- To view, visit http://gerrit.cloudera.org:8080/6034 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I01f5e503e14feee857bbdf80b3ef4f5a8b57fb2d Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm
[Impala-ASF-CR] IMPALA-4854: Fix incremental stats with complex types.
Bharath Vissapragada has posted comments on this change. Change subject: IMPALA-4854: Fix incremental stats with complex types. .. Patch Set 1: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/6033 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6e0335048d688ee25ff55c6628d0f6f8ecc1dd8a Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Bharath Vissapragada Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4854: Fix incremental stats with complex types.
Alex Behm has uploaded a new change for review. http://gerrit.cloudera.org:8080/6033 Change subject: IMPALA-4854: Fix incremental stats with complex types. .. IMPALA-4854: Fix incremental stats with complex types. The bug: Compute incremental stats used to always do a full stats recomputation for tables with complex types. The logic for detecting schema changes (e.g. an added column) did not take into consideration that columns with complex types are ignored in the stats computation, and should therefore not be recognized as a new column that does not yet have stats. Testing: - Added a new regression test - Locally ran test_compute_stats.py and the FE tests Change-Id: I6e0335048d688ee25ff55c6628d0f6f8ecc1dd8a --- M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M testdata/workloads/functional-query/queries/QueryTest/compute-stats-incremental.test 2 files changed, 47 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/6033/1 -- To view, visit http://gerrit.cloudera.org:8080/6033 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I6e0335048d688ee25ff55c6628d0f6f8ecc1dd8a Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm
[Impala-ASF-CR] IMPALA-2328: Read support for min/max Parquet statistics
Lars Volker has uploaded a new change for review. http://gerrit.cloudera.org:8080/6032 Change subject: IMPALA-2328: Read support for min/max Parquet statistics .. IMPALA-2328: Read support for min/max Parquet statistics This change adds support for skipping row groups based on Parquet row group statistics. With this change we only support reading statistics from Parquet files for numerical types (bool, integer, floating point) and for simple predicates of the formsor , where is LT, LE, GE, GT, and EQ. Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4 --- M be/src/exec/CMakeLists.txt M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h A be/src/exec/parquet-column-stats.cc M be/src/exec/parquet-column-stats.h M be/src/exec/parquet-metadata-utils.cc M be/src/exec/parquet-metadata-utils.h M be/src/exprs/expr.h M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/rewrite/NormalizeExprsRule.java A testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test M tests/query_test/test_insert_parquet.py 15 files changed, 589 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/6032/1 -- To view, visit http://gerrit.cloudera.org:8080/6032 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker
[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts
Zach Amsden has posted comments on this change. Change subject: IMPALA-2020: Add rounding for decimal casts .. Patch Set 11: I have not updated the Python test yet, as that would require a rebase. Python test and benchmark will come in next patch set, this is just to give a clean slate for review. -- To view, visit http://gerrit.cloudera.org:8080/5951 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4926: Upgrade LZ4 to 1.7.5
Henry Robinson has uploaded a new change for review. http://gerrit.cloudera.org:8080/6030 Change subject: IMPALA-4926: Upgrade LZ4 to 1.7.5 .. IMPALA-4926: Upgrade LZ4 to 1.7.5 LZ4 has deprecated the method names LZ4_compress and LZ4_uncompress, so rename those to their new versions LZ4_compress_default() and LZ4_decompress_fast() respectively. Otherwise no changes are required for compatibility with LZ4 1.7.5. Change-Id: I10e4561d0e940fa15ca8248c8277acfc6dff3da3 --- M be/src/util/compress.cc M be/src/util/decompress.cc M bin/impala-config.sh 3 files changed, 6 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/6030/1 -- To view, visit http://gerrit.cloudera.org:8080/6030 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I10e4561d0e940fa15ca8248c8277acfc6dff3da3 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson
[Impala-ASF-CR] IMPALA-4840: Fix REFRESH performance regression.
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-4840: Fix REFRESH performance regression. .. IMPALA-4840: Fix REFRESH performance regression. The fix for IMPALA-4172 introduced a regression in performance of the REFRESH command. The regression stems from the fact that we reload the block metadata of every valid data file without considering whether it has changed since the last load. This caused unnecessary metadata loads for unchanged files and thus increasing the runtime. The fix involves having the refresh codepath (and other operations that use the same codepath like insert etc.) to reload the metadata of only modified files by doing a listStatus() on the partition directory and checking the last modified time of each file. Without this patch, we relied on listFiles(), which fetched the block locations irrespective of whether the file has changed and it was significantly slower on unchanged tables. The initial/invalidate metadata load still fetches the block locations in bulk using listFiles(). The side effect of this change is that the refresh no longer picks up block location changes after HDFS block rebalancing. We suggest using "invalidate metadata" for that which loads the metadata from scratch. Additionally, this commit enables the reuse of metadata during table refresh (which was disabled in IMPALA-4172) to prevent reloading metadata from HMS everytime. Change-Id: I859b9fe93563ba886d0b5db6db42a14c88caada8 Reviewed-on: http://gerrit.cloudera.org:8080/6009 Reviewed-by: Dimitris TsirogiannisTested-by: Impala Public Jenkins --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java 2 files changed, 110 insertions(+), 32 deletions(-) Approvals: Impala Public Jenkins: Verified Dimitris Tsirogiannis: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/6009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I859b9fe93563ba886d0b5db6db42a14c88caada8 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage
Taras Bobrovytsky has uploaded a new patch set (#2). Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage .. IMPALA-4787: Optimize APPX_MEDIAN() memory usage Before this change, ReservoirSample functions (such as APPX_MEDIAN()) allocated memory for 20,000 elements up front per grouping key. This caused inefficient memory usage for aggregations with many grouping keys. This patch fixes this by initially allocating memory for 20 elements. Once the buffer becomes full, we allocate room for 200 elements and copy the original buffer into the new one. We continue increasing the buffer size this way until the buffer has room for 20,000 elements as before. Testing: Ran BE and some relevant EE tests locally. No new tests were added, the existing tests should provide enough coverage. Memory benchmark: The following query was used to verify that this patch reduces memory usage: SELECT APPX_MEDIAN(ss_sold_date_sk) FROM tpcds.store_sales GROUP BY ss_customer_sk; Peak Mem in Agg nodes is reduced from 4.94 GB to 20.67 MB. Summary before: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail 04:EXCHANGE 15.856ms5.856ms 14.82K 15.21K 0 -1.00 B UNPARTITIONED 03:AGGREGATE33s721ms3s789ms 14.82K 15.21K 4.94 GB 10.00 MB FINALIZE 02:EXCHANGE 3 139.276ms 157.753ms 15.60K 15.21K 0 0 HASH(ss_customer_sk) 01:AGGREGATE32s851ms3s026ms 15.60K 15.21K 5.29 GB 10.00 MB STREAMING 00:SCAN HDFS3 24.245ms 35.727ms 183.59K 183.59K 4.60 MB 384.00 MB tpcds.store_sales Summary after: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail --- 04:EXCHANGE 11.869ms 1.869ms 14.82K 15.21K 0 -1.00 B UNPARTITIONED 03:AGGREGATE3 34.225ms 37.978ms 14.82K 15.21K 20.67 MB 10.00 MB FINALIZE 02:EXCHANGE 3 860.854us 1.024ms 15.60K 15.21K 0 0 HASH(ss_customer_sk) 01:AGGREGATE3 74.048ms 93.605ms 15.60K 15.21K 9.99 MB 10.00 MB STREAMING 00:SCAN HDFS3 68.162ms 78.181ms 183.59K 183.59K 3.84 MB 384.00 MB tpcds.store_sales Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 --- M be/src/exprs/aggregate-functions-ir.cc M testdata/workloads/functional-query/queries/QueryTest/alloc-fail-init.test 2 files changed, 136 insertions(+), 33 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/6025/2 -- To view, visit http://gerrit.cloudera.org:8080/6025 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage
Taras Bobrovytsky has uploaded a new patch set (#2). Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage .. IMPALA-4787: Optimize APPX_MEDIAN() memory usage Before this change, ReservoirSample functions (such as APPX_MEDIAN()) allocated memory for 20,000 elements up front per grouping key. This caused inefficient memory usage for aggregations with many grouping keys. This patch fixes this by initially allocating memory for 20 elements. Once the buffer becomes full, we allocate room for 200 elements and copy the original buffer into the new one. We continue increasing the buffer size this way until the buffer has room for 20,000 elements as before. Testing: Ran BE and some relevant EE tests locally. No new tests were added, the existing tests should provide enough coverage. Memory benchmark: The following query was used to verify that this patch reduces memory usage: SELECT APPX_MEDIAN(ss_sold_date_sk) FROM tpcds.store_sales GROUP BY ss_customer_sk; Peak Mem in Agg nodes is reduced from 4.94 GB to 20.67 MB. Summary before: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail 04:EXCHANGE 15.856ms5.856ms 14.82K 15.21K 0 -1.00 B UNPARTITIONED 03:AGGREGATE33s721ms3s789ms 14.82K 15.21K 4.94 GB 10.00 MB FINALIZE 02:EXCHANGE 3 139.276ms 157.753ms 15.60K 15.21K 0 0 HASH(ss_customer_sk) 01:AGGREGATE32s851ms3s026ms 15.60K 15.21K 5.29 GB 10.00 MB STREAMING 00:SCAN HDFS3 24.245ms 35.727ms 183.59K 183.59K 4.60 MB 384.00 MB tpcds.store_sales Summary after: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail --- 04:EXCHANGE 11.869ms 1.869ms 14.82K 15.21K 0 -1.00 B UNPARTITIONED 03:AGGREGATE3 34.225ms 37.978ms 14.82K 15.21K 20.67 MB 10.00 MB FINALIZE 02:EXCHANGE 3 860.854us 1.024ms 15.60K 15.21K 0 0 HASH(ss_customer_sk) 01:AGGREGATE3 74.048ms 93.605ms 15.60K 15.21K 9.99 MB 10.00 MB STREAMING 00:SCAN HDFS3 68.162ms 78.181ms 183.59K 183.59K 3.84 MB 384.00 MB tpcds.store_sales Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 --- M be/src/exprs/aggregate-functions-ir.cc M testdata/workloads/functional-query/queries/QueryTest/alloc-fail-init.test 2 files changed, 136 insertions(+), 33 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/6025/2 -- To view, visit http://gerrit.cloudera.org:8080/6025 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage
Taras Bobrovytsky has uploaded a new patch set (#3). Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage .. IMPALA-4787: Optimize APPX_MEDIAN() memory usage Before this change, ReservoirSample functions (such as APPX_MEDIAN()) allocated memory for 20,000 elements up front per grouping key. This caused inefficient memory usage for aggregations with many grouping keys. This patch fixes this by initially allocating memory for 20 elements. Once the buffer becomes full, we allocate room for 200 elements and copy the original buffer into the new one. We continue increasing the buffer size this way until the buffer has room for 20,000 elements as before. Testing: Ran BE and some relevant EE tests locally. No new tests were added, the existing tests should provide enough coverage. Memory benchmark: The following query was used to verify that this patch reduces memory usage: SELECT APPX_MEDIAN(ss_sold_date_sk) FROM tpcds.store_sales GROUP BY ss_customer_sk; Peak Mem in Agg nodes is reduced from 4.94 GB to 20.67 MB. Summary before: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail 04:EXCHANGE 15.856ms5.856ms 14.82K 15.21K 0 -1.00 B UNPARTITIONED 03:AGGREGATE33s721ms3s789ms 14.82K 15.21K 4.94 GB 10.00 MB FINALIZE 02:EXCHANGE 3 139.276ms 157.753ms 15.60K 15.21K 0 0 HASH(ss_customer_sk) 01:AGGREGATE32s851ms3s026ms 15.60K 15.21K 5.29 GB 10.00 MB STREAMING 00:SCAN HDFS3 24.245ms 35.727ms 183.59K 183.59K 4.60 MB 384.00 MB tpcds.store_sales Summary after: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail --- 04:EXCHANGE 11.869ms 1.869ms 14.82K 15.21K 0 -1.00 B UNPARTITIONED 03:AGGREGATE3 34.225ms 37.978ms 14.82K 15.21K 20.67 MB 10.00 MB FINALIZE 02:EXCHANGE 3 860.854us 1.024ms 15.60K 15.21K 0 0 HASH(ss_customer_sk) 01:AGGREGATE3 74.048ms 93.605ms 15.60K 15.21K 9.99 MB 10.00 MB STREAMING 00:SCAN HDFS3 68.162ms 78.181ms 183.59K 183.59K 3.84 MB 384.00 MB tpcds.store_sales Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 --- M be/src/exprs/aggregate-functions-ir.cc M testdata/workloads/functional-query/queries/QueryTest/alloc-fail-init.test 2 files changed, 136 insertions(+), 33 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/6025/3 -- To view, visit http://gerrit.cloudera.org:8080/6025 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts
Zach Amsden has uploaded a new patch set (#11). Change subject: IMPALA-2020: Add rounding for decimal casts .. IMPALA-2020: Add rounding for decimal casts This change adds support for DECIMAL_V2 rounding behavior for both DECIMAL to INT and DOUBLE to DECIMAL casts. The round behavior implemented for exact halves is round halves away from zero (e.g (0.5 -> 1) and (-0.5 -> -1)). Testing: Added expr-test and decimal-test test coverage as well as manual testing. [localhost:21000] > select cast(0.5 AS int); +--+ | cast(0.5 as int) | +--+ | 0| +--+ Fetched 1 row(s) in 0.01s [localhost:21000] > select cast(cast(0.5999 as float) as decimal(5,1)); +-+ | cast(cast(0.5999 as float) as decimal(5,1)) | +-+ | 0.5 | +-+ Fetched 1 row(s) in 0.01s [localhost:21000] > set decimal_v2=1; DECIMAL_V2 set to 1 [localhost:21000] > select cast(0.5 AS int); +--+ | cast(0.5 as int) | +--+ | 1| +--+ Fetched 1 row(s) in 0.01s [localhost:21000] > select cast(cast(0.5999 as float) as decimal(5,1)); +-+ | cast(cast(0.5999 as float) as decimal(5,1)) | +-+ | 0.6 | +-+ Fetched 1 row(s) in 0.01s Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810 --- M be/src/exprs/decimal-operators-ir.cc M be/src/exprs/expr-test.cc M be/src/exprs/expr.h M be/src/exprs/literal.cc M be/src/runtime/decimal-test.cc M be/src/runtime/decimal-value.h M be/src/runtime/decimal-value.inline.h M be/src/udf/udf.h 8 files changed, 374 insertions(+), 82 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/5951/11 -- To view, visit http://gerrit.cloudera.org:8080/5951 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage
Taras Bobrovytsky has uploaded a new patch set (#2). Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage .. IMPALA-4787: Optimize APPX_MEDIAN() memory usage Before this change, ReservoirSample functions (such as APPX_MEDIAN()) allocated memory for 20,000 elements up front per grouping key. This caused inefficient memory usage for aggregations with many grouping keys. This patch fixes this by initially allocating memory for 20 elements. Once the buffer becomes full, we allocate room for 200 elements and copy the original buffer into the new one. We continue increasing the buffer size this way until the buffer has room for 20,000 elements as before. Testing: Ran BE and some relevant EE tests locally. No new tests were added, the existing tests should provide enough coverage. Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 --- M be/src/exprs/aggregate-functions-ir.cc M testdata/workloads/functional-query/queries/QueryTest/alloc-fail-init.test 2 files changed, 136 insertions(+), 33 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/6025/2 -- To view, visit http://gerrit.cloudera.org:8080/6025 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage .. Patch Set 1: (14 comments) http://gerrit.cloudera.org:8080/#/c/6025/1/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: Line 146: // We assume that the dst buffer has already been allocated earlier. This increases > can simplify with: Done Line 150: uint8_t* ptr = ctx->Reallocate(dst->ptr, buf_len); > DCHECK(dst->ptr != NULL); Done Line 918: const static int NUM_SAMPLES = NUM_BUCKETS * NUM_SAMPLES_PER_BUCKET; > NUM_SAMPLES -> MAX_NUM_SAMPLES Done Line 974: ReservoirSample* at(int64_t idx) { > single line, const function Done Line 979: void push_back(ReservoirSample s) { > use const&, otherwise 's' is copied twice Done Line 982: ReservoirSample* arr = reinterpret_cast(begin()); > no need to reinterpret_cast Done Line 987: ReservoirSample* begin() { > const function (and elsewhere) Done Line 994: ReservoirSample* end() { > single line Done Line 1005: void resizeReservoirSampleState(FunctionContext* ctx, StringVal* buffer) { > Why not move this function into ReservoirSampleState and make ReservoirSamp Unfortunately it's not possible to encapsulate because we also need to update the pointer in the StringVal buffer. It would be more complicated and error prone to pass it as one of parameters into this function. Line 1007: // can fit 10 times as many samples, up to a maximum of 20,000. We do not allocate > instead of 20,000 use the constant MAX_NUM_SAMPLES Done Line 1033: // If the array gets filled due to updates or merges, we will reallocate a larger > suggest "we reallocate a larger buffer to hold up to a maximum of MAX_NUM_S Done Line 1035: int init_size = (NUM_SAMPLES / 1000); > It would be nice to have this encapsulated in the ReservoirSampleState as w As mentioned above, it's not possible to encapsulate cleanly. Line 1061: resizeReservoirSampleState(ctx, dst); > if ReallocBuffer() failed, dst->ptr will be in an undefined state, and may I just added a check here, because it's not possible to move the logic as you suggested. Line 1147: DCHECK_EQ(src_val.len, sizeof(ReservoirSampleState) + > you can add a helper function to ReservoirSampleState to do this size compu Done. Also added IsFull(). -- To view, visit http://gerrit.cloudera.org:8080/6025 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts
Zach Amsden has posted comments on this change. Change subject: IMPALA-2020: Add rounding for decimal casts .. Patch Set 10: (1 comment) http://gerrit.cloudera.org:8080/#/c/5951/10/be/src/runtime/decimal-value.inline.h File be/src/runtime/decimal-value.inline.h: PS10, Line 127: typename RESULT_T::underlying_type_t > It's applicable to the next line too, right ? The following looks easier to I agree that does look nicer, but I can't do it without polluting the client namespace with whatever typedef I use, unless I put it inside another namespace, like detail, which doesn't really solve the problem either. -- To view, visit http://gerrit.cloudera.org:8080/5951 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts
Zach Amsden has posted comments on this change. Change subject: IMPALA-2020: Add rounding for decimal casts .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/5951/5/be/src/exprs/literal.cc File be/src/exprs/literal.cc: Line 185: value_.decimal4_val = Decimal4Value::FromDouble(type, v, , true); > Shouldn't this still be dependent on the behavior we want to test ? We test I really don't want to complicate tests any more than necessary so since this is just for tests, I just gave this the sane expected behavior. If this happens to break some tests, we'll find out and fix them, but this should not change Impala behavior. -- To view, visit http://gerrit.cloudera.org:8080/5951 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4546: Fix Moscow timezone conversion after 2014
Taras Bobrovytsky has uploaded a new patch set (#2). Change subject: IMPALA-4546: Fix Moscow timezone conversion after 2014 .. IMPALA-4546: Fix Moscow timezone conversion after 2014 In 2014 Moscow timezone rules changed from UTC+3 with no DST to UTC+4 with no DST. A special case has been added to timestamp functions to handle this. Testing: Added BE Expr tests. Change-Id: Id6e3f2c9f6ba29749a26bc1087e664637bc02528 --- M be/src/exprs/expr-test.cc M be/src/exprs/timestamp-functions.cc M be/src/exprs/timezone_db.cc M be/src/exprs/timezone_db.h 4 files changed, 90 insertions(+), 44 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/5969/2 -- To view, visit http://gerrit.cloudera.org:8080/5969 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id6e3f2c9f6ba29749a26bc1087e664637bc02528 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-4546: Fix Moscow timezone conversion after 2014
Taras Bobrovytsky has uploaded a new patch set (#2). Change subject: IMPALA-4546: Fix Moscow timezone conversion after 2014 .. IMPALA-4546: Fix Moscow timezone conversion after 2014 In 2014 Moscow timezone rules changed from UTC+3 with no DST to UTC+4 with no DST. A special case has been added to timestamp functions to handle this. Testing: Added BE Expr tests. Change-Id: Id6e3f2c9f6ba29749a26bc1087e664637bc02528 --- M be/src/exprs/expr-test.cc M be/src/exprs/timestamp-functions.cc M be/src/exprs/timezone_db.cc M be/src/exprs/timezone_db.h 4 files changed, 90 insertions(+), 44 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/5969/2 -- To view, visit http://gerrit.cloudera.org:8080/5969 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id6e3f2c9f6ba29749a26bc1087e664637bc02528 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-4546: Fix Moscow timezone conversion after 2014
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-4546: Fix Moscow timezone conversion after 2014 .. Patch Set 1: (7 comments) http://gerrit.cloudera.org:8080/#/c/5969/1/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: Line 3588: // IMPALA-4209: Moscow time change in 2011. > Moscow could use a test on its own, rather than adding so many lines to Tim Done PS1, Line 3590: cast(from_utc_timestamp('2010-10-30 22:59:00', " : "'Europe/Moscow') as string) > This could be turned into a template of calling TestStringValue with "cast( Done Line 3596: TestIsNull("cast(to_utc_timestamp('2010-10-31 02:00:00', " > Each one of these lines could use a short comment like "spring forward into Done http://gerrit.cloudera.org:8080/#/c/5969/1/be/src/exprs/timestamp-functions.cc File be/src/exprs/timestamp-functions.cc: PS1, Line 194: January 19, 1992 > What about in 1991? Added comment. In general, Timestamp timezone conversions are broken. http://gerrit.cloudera.org:8080/#/c/5969/1/be/src/exprs/timezone_db.h File be/src/exprs/timezone_db.h: PS1, Line 41: "tz" > 'tz', because the parameter tz is a std::string and so "tz" is a valid valu Done Line 45: /// Moscow timezone UTC+3 with DST, for use before 27 March 2011. > Between 27 March 2011 and 26 October 2014, which timezone was Moscow in? Moscow was in TIMEZONE_MSK_PRE_2014 timezone between 2011 and 2014, which is UTC+4 PS1, Line 48: October 26 2014 > "26 October 2014" for parallelism with above comment. I changed both to Month, Day, Year. -- To view, visit http://gerrit.cloudera.org:8080/5969 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id6e3f2c9f6ba29749a26bc1087e664637bc02528 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4937: Remove unused kudu scanner keep alive variable
Alex Behm has posted comments on this change. Change subject: IMPALA-4937: Remove unused kudu scanner keep alive variable .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6021 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I99f56f93f1ff8543cbe415542464a053b24968e2 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage
Alex Behm has posted comments on this change. Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage .. Patch Set 1: (14 comments) Did you verify the memory savings with a simple experiment? http://gerrit.cloudera.org:8080/#/c/6025/1/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: Line 146: // We assume that the dst buffer has already been allocated earlier. This increases can simplify with: Same as AllocBuffer() above but for re-allocating 'dst->ptr'. Line 150: uint8_t* ptr = ctx->Reallocate(dst->ptr, buf_len); DCHECK(dst->ptr != NULL); Line 918: const static int NUM_SAMPLES = NUM_BUCKETS * NUM_SAMPLES_PER_BUCKET; NUM_SAMPLES -> MAX_NUM_SAMPLES Line 974: ReservoirSample* at(int64_t idx) { single line, const function Line 979: void push_back(ReservoirSample s) { use const&, otherwise 's' is copied twice Line 982: ReservoirSample* arr = reinterpret_cast(begin()); no need to reinterpret_cast Line 987: ReservoirSample* begin() { const function (and elsewhere) Line 994: ReservoirSample* end() { single line Line 1005: void resizeReservoirSampleState(FunctionContext* ctx, StringVal* buffer) { Why not move this function into ReservoirSampleState and make ReservoirSampleState auto-resize itself during push_back()? You can add ctx as a param to push_back. That way all the logic is encapsulated inside ReservoirSampleState and not throughout various agg fn implementations Line 1007: // can fit 10 times as many samples, up to a maximum of 20,000. We do not allocate instead of 20,000 use the constant MAX_NUM_SAMPLES Line 1033: // If the array gets filled due to updates or merges, we will reallocate a larger suggest "we reallocate a larger buffer to hold up to a maximum of MAX_NUM_SAMPLES ReservoirSamples." Line 1035: int init_size = (NUM_SAMPLES / 1000); It would be nice to have this encapsulated in the ReservoirSampleState as well. Line 1061: resizeReservoirSampleState(ctx, dst); if ReallocBuffer() failed, dst->ptr will be in an undefined state, and may cause a crash I suggest moving the reallocation logic into the push_back() of ReservoirSampleState. You might use the return code of push_back() to signal success/failure, so you can handle the failure case here. Line 1147: DCHECK_EQ(src_val.len, sizeof(ReservoirSampleState) + you can add a helper function to ReservoirSampleState to do this size computation -- To view, visit http://gerrit.cloudera.org:8080/6025 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage .. Patch Set 1: I'll do a CR later tonight or tmr morning. -- To view, visit http://gerrit.cloudera.org:8080/6025 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4933: Force thrift to initialize SSL on process startup
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-4933: Force thrift to initialize SSL on process startup .. Patch Set 1: Running jenkins job now, will also deploy to a secure cluster for additional testing -- To view, visit http://gerrit.cloudera.org:8080/6027 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I245a8a001103ddca7f07349faa82bbb5b9fe3ab0 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4933: Force thrift to initialize SSL on process startup
Matthew Jacobs has uploaded a new change for review. http://gerrit.cloudera.org:8080/6027 Change subject: IMPALA-4933: Force thrift to initialize SSL on process startup .. IMPALA-4933: Force thrift to initialize SSL on process startup OpenSSL initialization functions are not threadsafe and are currently called by both thrift, squeasel, and the Kudu client. This change forces thrift to initialize OpenSSL on startup by adding a TSSLSocketFactory to the AuthManager which initializes OpenSSL upon creation and lives the lifetime of the process. Then, squeasel is configured to skip the OpenSSL initialization. TODO: When the Kudu client supports a flag to disable its initialization path (KUDU-1738), Impala will call that. In the meantime, there will continue to be some small likelihood of a race. Change-Id: I245a8a001103ddca7f07349faa82bbb5b9fe3ab0 --- M be/src/rpc/authentication.cc M be/src/rpc/authentication.h M be/src/util/webserver.cc 3 files changed, 18 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/6027/1 -- To view, visit http://gerrit.cloudera.org:8080/6027 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I245a8a001103ddca7f07349faa82bbb5b9fe3ab0 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs
[Impala-ASF-CR] IMPALA-4840: Fix REFRESH performance regression.
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4840: Fix REFRESH performance regression. .. Patch Set 4: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/278/ -- To view, visit http://gerrit.cloudera.org:8080/6009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I859b9fe93563ba886d0b5db6db42a14c88caada8 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4840: Fix REFRESH performance regression.
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4840: Fix REFRESH performance regression. .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I859b9fe93563ba886d0b5db6db42a14c88caada8 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4822: Implement dynamic log level changes
Bharath Vissapragada has uploaded a new patch set (#13). Change subject: IMPALA-4822: Implement dynamic log level changes .. IMPALA-4822: Implement dynamic log level changes Very often we have to change the logging levels of Impalads and Catalog server for debugging purposes. Currently, there is no way to do that without a restart of the daemons, which is not a viable option in production deployments. This patch addresses this supportability gap by exposing the ability to set dynamic logging levels using a simple web endpoint on Impalad/Catalog/Statestore web pages. This includes setting VLOG levels (equivalent to --v flag) as well as setting log4j levels on the Frontend and the Catalog JVMs. Change-Id: I588418e9bcb0b66d33138baf96207a5a35bfbd63 --- M be/src/catalog/catalog-server.cc M be/src/service/impala-http-handler.cc M be/src/statestore/statestore.cc M be/src/util/jni-util.cc M be/src/util/jni-util.h M be/src/util/logging-support.cc M be/src/util/logging-support.h M common/thrift/Logging.thrift M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/util/GlogAppender.java M tests/webserver/test_web_pages.py A www/log_level.tmpl 12 files changed, 473 insertions(+), 11 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/5792/13 -- To view, visit http://gerrit.cloudera.org:8080/5792 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I588418e9bcb0b66d33138baf96207a5a35bfbd63 Gerrit-PatchSet: 13 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker
[Impala-ASF-CR] IMPALA-4822: Implement dynamic log level changes
Bharath Vissapragada has posted comments on this change. Change subject: IMPALA-4822: Implement dynamic log level changes .. Patch Set 12: (16 comments) Fxed a minor bug in the web UI where the setting for java log level changes were showing up on the statestore web UI which doesn't have a JVM. http://gerrit.cloudera.org:8080/#/c/5792/12/be/src/catalog/catalog-server.cc File be/src/catalog/catalog-server.cc: Line 209: > remove blank line Done http://gerrit.cloudera.org:8080/#/c/5792/12/be/src/util/jni-util.h File be/src/util/jni-util.h: PS12, Line 230: the above > explicitly say LoadJniMethod. Otherwise someone might put a method between haha, good point. Changed it. PS12, Line 234: It seems these : /// must be defined in the header to compile properly. > can you remove this line (I probably wrote it...)? templates need to be in Done http://gerrit.cloudera.org:8080/#/c/5792/12/be/src/util/logging-support.cc File be/src/util/logging-support.cc: PS12, Line 129: Status SetJavaLogLevel(const TSetJavaLogLevelParams& params, string* result) { : RETURN_IF_ERROR( : JniUtil::CallJniMethod(log4j_logger_class_, set_log_level_method, params, result)); : return Status::OK(); : } > Is this called in more than one place? Otherwise, let's just inline it into Yea that is the only place. I've written this way think it would be more readable but I agree it looks redundant. PS12, Line 143: const Status& status > What I mean was: rather than constructing a status every time you want to c Ah sorry misunderstood. Made the change. PS12, Line 149: SetDisplayResult > how about 'AddDocumentMember()" ? That sounds better. Done. PS12, Line 158: log_getclass->second == nullptr > Please remove the comparisons to nullptr for strings, and replace with .emp Done PS12, Line 215: return > Maybe make this case an error as well so that user knows what's missing. Done PS12, Line 223: std:: > remove Done PS12, Line 229: std:: > remove Done Line 307: void RegisterLogLevelCallbacks(const string& url, Webserver* webserver, bool register_log4j_handlers) { > long line (consider using git-clang-format) Done. PS12, Line 310: set_glog_log > sorry to nitpick - but set_glog_log seems repetitive. Why not set_glog_leve Valid point. Changed. http://gerrit.cloudera.org:8080/#/c/5792/12/be/src/util/logging-support.h File be/src/util/logging-support.h: PS12, Line 39: init_log4j > update Done PS12, Line 40: const std::string& url > is this needed, or is it the same for all callers? Removed, its the same for all callers. http://gerrit.cloudera.org:8080/#/c/5792/12/fe/src/main/java/org/apache/impala/util/GlogAppender.java File fe/src/main/java/org/apache/impala/util/GlogAppender.java: Line 46: > remove blank line Done http://gerrit.cloudera.org:8080/#/c/5792/12/tests/webserver/test_web_pages.py File tests/webserver/test_web_pages.py: PS12, Line 86: \ > remove where you have used parentheses Done -- To view, visit http://gerrit.cloudera.org:8080/5792 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I588418e9bcb0b66d33138baf96207a5a35bfbd63 Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4828: Alter Kudu schema outside Impala may crash on read
Alex Behm has posted comments on this change. Change subject: IMPALA-4828: Alter Kudu schema outside Impala may crash on read .. Patch Set 7: (2 comments) http://gerrit.cloudera.org:8080/#/c/5840/7/tests/query_test/test_kudu.py File tests/query_test/test_kudu.py: Line 101: def test_kudu_col_changed(self, cursor, kudu_client, unique_database): Tests lgtm, but there seems to be a lot of repeated boilerplate stuff. Maybe you can take another pass and see if it makes sense to factor out some parts to condense the code? Line 106: kudu_tbl_name = "impala::%s.foo" % unique_database seems worth factoring out this function and adding a comment that it must be kept in sync with the FE table-name-generation behavior -- To view, visit http://gerrit.cloudera.org:8080/5840 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage
Taras Bobrovytsky has uploaded a new change for review. http://gerrit.cloudera.org:8080/6025 Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage .. IMPALA-4787: Optimize APPX_MEDIAN() memory usage Before this change, ReservoirSample functions (such as APPX_MEDIAN()) allocated memory for 20,000 elements up front per grouping key. This caused inefficient memory usage for aggregations with many grouping keys. This patch fixes this by initially allocating memory for 20 elements. Once the buffer becomes full, we allocate room for 200 elements and copy the original buffer into the new one. We continue increasing the buffer size this way until the buffer has room for 20,000 elements as before. Testing: Ran BE and some relevant EE tests locally. No new tests were added, the existing tests should provide enough coverage. Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 --- M be/src/exprs/aggregate-functions-ir.cc 1 file changed, 124 insertions(+), 30 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/6025/1 -- To view, visit http://gerrit.cloudera.org:8080/6025 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-4848: Add WIDHT BUCKET() function
anujphadke has posted comments on this change. Change subject: IMPALA-4848: Add WIDHT_BUCKET() function .. Patch Set 1: Will add a commit message. Results of the tests match the postgres output. aphadke-MBP-2:~ aphadke$ "/Applications/Postgres.app/Contents/Versions/9.6/bin/psql" -p5432 -d "aphadke" psql (9.6.1) Type "help" for help. aphadke=# aphadke=# aphadke=# aphadke=# aphadke=# select width_bucket(6.3, 2, 17, 2); width_bucket -- 1 (1 row) aphadke=# select width_bucket(11, 6, 14, 3); width_bucket -- 2 (1 row) aphadke=# select width_bucket(-1, -5, 5, 3); width_bucket -- 2 (1 row) aphadke=# select width_bucket(1, -5, 5, 3); width_bucket -- 2 (1 row) aphadke=# select width_bucket(3, 5, 20.1, 4); width_bucket -- 0 (1 row) aphadke=# select width_bucket(22, 5, 20.1, 4); width_bucket -- 5 (1 row) aphadke=# select width_bucket(NULL, 2, 14, 4); width_bucket -- (1 row) aphadke=# select width_bucket(NULL, 2, 14, 0); width_bucket -- (1 row) aphadke=# select width_bucket(22, 5, 20.1, 0); ERROR: count must be greater than zero aphadke=# select width_bucket(22, 5, 5, 0); ERROR: count must be greater than zero aphadke=# -- To view, visit http://gerrit.cloudera.org:8080/6023 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I081bc916b1bef7b929ca161a9aade3b54c6b858f Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadkeGerrit-Reviewer: anujphadke Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Alex Behm has posted comments on this change. Change subject: IMPALA-3586: Implement union passthrough .. Patch Set 7: (40 comments) http://gerrit.cloudera.org:8080/#/c/5816/7/be/src/exec/analytic-eval-node.cc File be/src/exec/analytic-eval-node.cc: Line 145: DCHECK(child(0)->row_desc().IsPrefixOfEquivalent(row_desc())); This should be IsPrefixOf() because we sanity checking the row descriptors of exec nodes (and not row batches). http://gerrit.cloudera.org:8080/#/c/5816/7/be/src/exec/union-node.cc File be/src/exec/union-node.cc: Line 115: // passthrough case, the child can be closed right away because the row batch received the child can be closed right away -> the child was already closed? Line 116: // from the child is copied (more details below). accuracy: the row batch wasn't copied Line 121: if (child_idx_ < children_.size() && isChildPassThrough(child_idx_)) { Suggest comment: // Handle passthrough children. We pass 'row_batch' directly into the GetNext() call on the child. Line 122: // If the rows from the current child can be passed through, the physical row layout This comment doesn't seem to add anything, I suggest removing it. Line 131: // It may be possible that the row batch is not empty, so we save the previous value. More details would be helpful. If the batch has rows at this point, I'd imagine it can cause all sorts of other problems. How can the batch already have rows? Line 148: // 'needs_deep_copy' lets us safely close the child in the next GetNext() call. style: 'needs_deep_copy' is not a visible variable here, suggest just saying "Marking the batch as needing a deep copy let's us ... Line 154: DCHECK that child_idx_ is not passthrough here http://gerrit.cloudera.org:8080/#/c/5816/7/be/src/exec/union-node.h File be/src/exec/union-node.h: Line 67: // Which children can be passed through, without being materialized. ... without evaluating and materializing their exprs. http://gerrit.cloudera.org:8080/#/c/5816/7/be/src/runtime/descriptors.h File be/src/runtime/descriptors.h: Line 412: /// Return true if the physical layout of this descriptor matches the physical layout matches that of other_desc Line 413: /// of other_desc, but not necessarily ids. bot not necessarily the id. Line 565: /// of other_desc, but not necessarily ids. but not necessarily the ids http://gerrit.cloudera.org:8080/#/c/5816/7/be/src/service/query-options.h File be/src/service/query-options.h: Line 38: TImpalaQueryOptions::DISABLE_UNION_PASSTHROUGH + 1);\ I tend to prefer ENABLE_UNION_PASSTHROUGH. To me positive phrasing is a little easier to understand. What do you think? http://gerrit.cloudera.org:8080/#/c/5816/7/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java File fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java: Line 227: public boolean isEquivalent(SlotDescriptor other) { Unfortunately, the term 'equivalent' already has a different meaning in the FE for slots, so it would be good to the existing term fro this new one. Maybe isLayoutEquivalent()? http://gerrit.cloudera.org:8080/#/c/5816/7/fe/src/main/java/org/apache/impala/analysis/UnionStmt.java File fe/src/main/java/org/apache/impala/analysis/UnionStmt.java: Line 616: public List getUnionNodeResultExprs() { getUnionResultExprs() http://gerrit.cloudera.org:8080/#/c/5816/7/fe/src/main/java/org/apache/impala/planner/UnionNode.java File fe/src/main/java/org/apache/impala/planner/UnionNode.java: Line 47: * a child has an identical tuple layout as the output of the union node, the ... as the output of the union node, and the child only has naked SlotRefs as result exprs, then the child is marked as 'passthrough'. The rows of passthrough children are directly returned by the union node, instead of materializing the child's result exprs into new tuples. Line 57: protected final List resultExprs_; unionResultExprs_ to make distinguish it from the resultExprLists_ and such Line 73: // If false, no batches from child nodes would be passed through in the backend. Comment should describe what this flag is. Also you mean "true" right? Line 76: // Indicates for which child nodes batches can be passed through in the backend. remove "in the backend" (it's clear that execution happens there) Line 81: protected UnionNode(PlanNodeId id, TupleId tupleId) { Is this c'tor still needed? If not, remove. Line 89: List resultExprs, boolean isInSubplan) { indentation, unionResultExprs Line 182:* Compute whether we can pass through rows without materializing for the given child. Can combine/simplify like this: Returns true if rows from the child with 'childTupleIds' and 'childResultExprs' can be returned directly by the union node (without materialization into a new tuple). Might be good to list the conditions for passthrough
[Impala-ASF-CR] IMPALA-4848: Add WIDHT BUCKET() function
anujphadke has uploaded a new change for review. http://gerrit.cloudera.org:8080/6023 Change subject: IMPALA-4848: Add WIDHT_BUCKET() function .. IMPALA-4848: Add WIDHT_BUCKET() function Change-Id: I081bc916b1bef7b929ca161a9aade3b54c6b858f --- M be/src/exprs/expr-test.cc M be/src/exprs/math-functions-ir.cc M be/src/exprs/math-functions.h M common/function-registry/impala_functions.py 4 files changed, 32 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/6023/1 -- To view, visit http://gerrit.cloudera.org:8080/6023 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I081bc916b1bef7b929ca161a9aade3b54c6b858f Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadke
[Impala-ASF-CR] IMPALA-4931: Update squeasel to include patch to disable SSL init
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4931: Update squeasel to include patch to disable SSL init .. Patch Set 2: Verified-1 Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/277/ -- To view, visit http://gerrit.cloudera.org:8080/6006 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3d28f2a5391b8a7d39a50002ff1d96ef3d927567 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4916: Fix maintenance of set of item sets in DisjointSet.
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4916: Fix maintenance of set of item sets in DisjointSet. .. Patch Set 3: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/5980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I609c8795c09becd78815605ea8e82e2f99e82212 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4828: Alter Kudu schema outside Impala may crash on read
Hello Thomas Tauber-Marshall, Sailesh Mukil, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5840 to look at the new patch set (#7). Change subject: IMPALA-4828: Alter Kudu schema outside Impala may crash on read .. IMPALA-4828: Alter Kudu schema outside Impala may crash on read Creating a table in Impala, changing the column schema outside of Impala, and then reading again in Impala may result in a crash. Impala may attempt to dereference pointers that aren't there. This happens if a string column is dropped and then a new, non string column is added with the old string column's name. The Kudu scan token contains the projection schema, and that is validated when opening the Kudu scanner (with the exception of KUDU-1881), but the issue is that during planning, Impala assumes the types/nullability of columns haven't changed when creating the scan tokens. This is fixed by adding a check when creating the scan token, and failing the query if the column types changed. Impala then relies on the Kudu client to properly validate that the underlying schema is still represented by the scan token, and that deserialization will fail if it no longer matches. Test cases were added for this particular crash scenario, which now fails during planning as expected. This does not attempt to validate the Kudu client validation at deserialization time, though that would be valuable coverage to add in the future. Columns being removed don't produce a crash; the query fails gracefully. A test was added for this case. Columns being added should not affect this scenario, but a test was added anyway. Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a --- M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M tests/query_test/test_kudu.py 2 files changed, 216 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/5840/7 -- To view, visit http://gerrit.cloudera.org:8080/5840 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-4840: Fix REFRESH performance regression.
Alex Behm has posted comments on this change. Change subject: IMPALA-4840: Fix REFRESH performance regression. .. Patch Set 3: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/6009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I859b9fe93563ba886d0b5db6db42a14c88caada8 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4885: Expose Jvm thread info in web UI
Bharath Vissapragada has posted comments on this change. Change subject: IMPALA-4885: Expose Jvm thread info in web UI .. Patch Set 1: (5 comments) http://gerrit.cloudera.org:8080/#/c/6013/1/be/src/util/thread.cc File be/src/util/thread.cc: Line 178: void ThreadOverviewUrlCallback(bool track_jvm_threads, Shouldn't this go above ThreadGroupUrlCallback() to match with the sample jsons? I think the one right above it now corresponds to ThreadGroupUrlCallback()? No? PS1, Line 194: [this, track_jvm_threads] (const Webserver::ArgumentMap& args, Document* doc) { : this->ThreadOverviewUrlCallback(track_jvm_threads, args, doc) Make this a method? MakeUrlCallBack or something, used multiple times. PS1, Line 327: INFO warn/error? How about bubbling this error up to the web endpoint by setting the "error" member? (Other places too). http://gerrit.cloudera.org:8080/#/c/6013/1/be/src/util/thread.h File be/src/util/thread.h: Line 191: /// the "thread-manager." Describe track_jvm_threads in the method comment? http://gerrit.cloudera.org:8080/#/c/6013/1/common/thrift/Frontend.thrift File common/thrift/Frontend.thrift: Line 720: // Summary of a JVM thread. Includes stacktraces, locked monitors Do you think its good to include the locked synchronizers or the lock it is blocked on? (ThreadInfo#getLockInfo() / getLockOwnerId() etc.).. Helps debugging hangs. -- To view, visit http://gerrit.cloudera.org:8080/6013 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id497043ab33dcf107a562f0b1ccd5c46095d397f Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4840: Fix REFRESH performance regression.
Bharath Vissapragada has uploaded a new patch set (#3). Change subject: IMPALA-4840: Fix REFRESH performance regression. .. IMPALA-4840: Fix REFRESH performance regression. The fix for IMPALA-4172 introduced a regression in performance of the REFRESH command. The regression stems from the fact that we reload the block metadata of every valid data file without considering whether it has changed since the last load. This caused unnecessary metadata loads for unchanged files and thus increasing the runtime. The fix involves having the refresh codepath (and other operations that use the same codepath like insert etc.) to reload the metadata of only modified files by doing a listStatus() on the partition directory and checking the last modified time of each file. Without this patch, we relied on listFiles(), which fetched the block locations irrespective of whether the file has changed and it was significantly slower on unchanged tables. The initial/invalidate metadata load still fetches the block locations in bulk using listFiles(). The side effect of this change is that the refresh no longer picks up block location changes after HDFS block rebalancing. We suggest using "invalidate metadata" for that which loads the metadata from scratch. Additionally, this commit enables the reuse of metadata during table refresh (which was disabled in IMPALA-4172) to prevent reloading metadata from HMS everytime. Change-Id: I859b9fe93563ba886d0b5db6db42a14c88caada8 --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java 2 files changed, 110 insertions(+), 32 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/6009/3 -- To view, visit http://gerrit.cloudera.org:8080/6009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I859b9fe93563ba886d0b5db6db42a14c88caada8 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis
[Impala-ASF-CR] IMPALA-4840: Fix REFRESH performance regression.
Bharath Vissapragada has posted comments on this change. Change subject: IMPALA-4840: Fix REFRESH performance regression. .. Patch Set 2: (4 comments) http://gerrit.cloudera.org:8080/#/c/6009/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: Line 787:* for modified files on HDFS. This method uses a FileSystem.listStatus() call on the > Suggest something like: Done Line 789:* block locations using FileSystem.getFileBlockLocations() method. The initial table > the FileSystem.getFileBlockLocations() method Done Line 793:* (up to ~40x slower in some cases) and hence it is implemented this way to optimize > suggest moving this wording to the top as suggested above Done Line 836: if (unknownDiskIdCount > 0 && LOG.isWarnEnabled()) { > remove the disk id warning as you suggested I still retained it in the initial table load using loadBlockMetadata(). I think that helps debugging. Removed it here though. -- To view, visit http://gerrit.cloudera.org:8080/6009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I859b9fe93563ba886d0b5db6db42a14c88caada8 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4840: Fix REFRESH performance regression.
Alex Behm has posted comments on this change. Change subject: IMPALA-4840: Fix REFRESH performance regression. .. Patch Set 2: (4 comments) Getting close. http://gerrit.cloudera.org:8080/#/c/6009/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: Line 787:* for modified files on HDFS. This method uses a FileSystem.listStatus() call on the Suggest something like: This method is optimized for the case where the files in the partition have not changed dramatically. It first uses FileSystem.listStatus() ... then you can mention the perf difference between these functions Line 789:* block locations using FileSystem.getFileBlockLocations() method. The initial table the FileSystem.getFileBlockLocations() method Line 793:* (up to ~40x slower in some cases) and hence it is implemented this way to optimize suggest moving this wording to the top as suggested above Line 836: if (unknownDiskIdCount > 0 && LOG.isWarnEnabled()) { remove the disk id warning as you suggested -- To view, visit http://gerrit.cloudera.org:8080/6009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I859b9fe93563ba886d0b5db6db42a14c88caada8 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4828: Alter Kudu schema outside Impala may crash on read
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-4828: Alter Kudu schema outside Impala may crash on read .. Patch Set 6: (2 comments) http://gerrit.cloudera.org:8080/#/c/5840/2/be/src/exec/kudu-scanner.cc File be/src/exec/kudu-scanner.cc: Line 149: > I think Thomas is right and the query may crash if there is a difference in I couldn't reproduce any failures, but looking at TupleDescriptor I agree it seems like it could be possible to crash. Good call Thomas & Alex. I'll check that here too. Will the Kudu-side validation check that the projection is 100% identical to what the scan token expects? Nullability etc. included? In theory it should, thought it looks like it's missing on the Kudu side as well, so I filed a JIRA on them https://issues.apache.org/jira/browse/KUDU-1881 . Thanks for pointing this out. http://gerrit.cloudera.org:8080/#/c/5840/6/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java File fe/src/main/java/org/apache/impala/planner/KuduScanNode.java: Line 161: if (!colType.matchesType(kuduColType)) { > Shouldn't this use equals() instead of matchesType()? Done -- To view, visit http://gerrit.cloudera.org:8080/5840 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4840: Fix REFRESH performance regression.
Bharath Vissapragada has uploaded a new patch set (#2). Change subject: IMPALA-4840: Fix REFRESH performance regression. .. IMPALA-4840: Fix REFRESH performance regression. The fix for IMPALA-4172 introduced a regression in performance of the REFRESH command. The regression stems from the fact that we reload the block metadata of every valid data file without considering whether it has changed since the last load. This caused unnecessary metadata loads for unchanged files and thus increasing the runtime. The fix involves having the refresh codepath (and other operations that use the same codepath like insert etc.) to reload the metadata of only modified files by doing a listStatus() on the partition directory and checking the last modified time of each file. Without this patch, we relied on listFiles(), which fetched the block locations irrespective of whether the file has changed and it was significantly slower on unchanged tables. The initial/invalidate metadata load still fetches the block locations in bulk using listFiles(). The side effect of this change is that the refresh no longer picks up block location changes after HDFS block rebalancing. We suggest using "invalidate metadata" for that which loads the metadata from scratch. Additionally, this commit enables the reuse of metadata during table refresh (which was disabled in IMPALA-4172) to prevent reloading metadata from HMS everytime. Change-Id: I859b9fe93563ba886d0b5db6db42a14c88caada8 --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java 2 files changed, 116 insertions(+), 32 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/6009/2 -- To view, visit http://gerrit.cloudera.org:8080/6009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I859b9fe93563ba886d0b5db6db42a14c88caada8 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis
[Impala-ASF-CR] IMPALA-4931: Update squeasel to include patch to disable SSL init
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4931: Update squeasel to include patch to disable SSL init .. Patch Set 2: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/277/ -- To view, visit http://gerrit.cloudera.org:8080/6006 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3d28f2a5391b8a7d39a50002ff1d96ef3d927567 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4931: Update squeasel to include patch to disable SSL init
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-4931: Update squeasel to include patch to disable SSL init .. Patch Set 2: Code-Review+2 removed the old cmake line and rebased -- To view, visit http://gerrit.cloudera.org:8080/6006 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3d28f2a5391b8a7d39a50002ff1d96ef3d927567 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4931: Update squeasel to include patch to disable SSL init
Hello Henry Robinson, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6006 to look at the new patch set (#2). Change subject: IMPALA-4931: Update squeasel to include patch to disable SSL init .. IMPALA-4931: Update squeasel to include patch to disable SSL init Update Squeasel in thirdparty to c304d3f3481b07bf153979155f02e0aab24d01de Change-Id: I3d28f2a5391b8a7d39a50002ff1d96ef3d927567 --- M be/src/thirdparty/squeasel/squeasel.c M be/src/util/CMakeLists.txt 2 files changed, 36 insertions(+), 26 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/6006/2 -- To view, visit http://gerrit.cloudera.org:8080/6006 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3d28f2a5391b8a7d39a50002ff1d96ef3d927567 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil
[Impala-ASF-CR] Revert "IMPALA-4829: Change default Kudu read behavior for "RYW""
Matthew Jacobs has posted comments on this change. Change subject: Revert "IMPALA-4829: Change default Kudu read behavior for "RYW"" .. Patch Set 3: Code-Review+2 rebase -- To view, visit http://gerrit.cloudera.org:8080/5970 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I995dec543946c9e0f79bc5b7e82568060a9d8262 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] Revert "IMPALA-4829: Change default Kudu read behavior for "RYW""
Impala Public Jenkins has posted comments on this change. Change subject: Revert "IMPALA-4829: Change default Kudu read behavior for "RYW"" .. Patch Set 3: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/276/ -- To view, visit http://gerrit.cloudera.org:8080/5970 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I995dec543946c9e0f79bc5b7e82568060a9d8262 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] Revert "IMPALA-4829: Change default Kudu read behavior for "RYW""
Impala Public Jenkins has posted comments on this change. Change subject: Revert "IMPALA-4829: Change default Kudu read behavior for "RYW"" .. Patch Set 2: Verified-1 Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/274/ -- To view, visit http://gerrit.cloudera.org:8080/5970 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I995dec543946c9e0f79bc5b7e82568060a9d8262 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4916: Fix maintenance of set of item sets in DisjointSet.
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4916: Fix maintenance of set of item sets in DisjointSet. .. Patch Set 3: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/275/ -- To view, visit http://gerrit.cloudera.org:8080/5980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I609c8795c09becd78815605ea8e82e2f99e82212 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4916: Fix maintenance of set of item sets in DisjointSet.
Alex Behm has posted comments on this change. Change subject: IMPALA-4916: Fix maintenance of set of item sets in DisjointSet. .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/5980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I609c8795c09becd78815605ea8e82e2f99e82212 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4916: Fix maintenance of set of item sets in DisjointSet.
Alex Behm has posted comments on this change. Change subject: IMPALA-4916: Fix maintenance of set of item sets in DisjointSet. .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/5980/2/fe/src/main/java/org/apache/impala/util/DisjointSet.java File fe/src/main/java/org/apache/impala/util/DisjointSet.java: PS2, Line 49: private final IdentityHashMapuniqueSets_ = : new IdentityHashMap (); > nice idea :) It was pretty much your idea :) PS2, Line 111: checkState(removedValue == DUMMY_VALUE); > I think checking for not NULL is also ok. This one seems more explicit because if we got back a non-NULL value that is not DUMMY_VALUE, then something bad has happened. -- To view, visit http://gerrit.cloudera.org:8080/5980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I609c8795c09becd78815605ea8e82e2f99e82212 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: Yes
[Impala-ASF-CR] Revert "IMPALA-4829: Change default Kudu read behavior for "RYW""
Impala Public Jenkins has posted comments on this change. Change subject: Revert "IMPALA-4829: Change default Kudu read behavior for "RYW"" .. Patch Set 2: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/274/ -- To view, visit http://gerrit.cloudera.org:8080/5970 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I995dec543946c9e0f79bc5b7e82568060a9d8262 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4885: Expose Jvm thread info in web UI
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4885: Expose Jvm thread info in web UI .. Patch Set 1: A publicly available screenshot: https://drive.google.com/file/d/0B7VW0hRyzVlKUGdWakNIdlhtbTQ/view?usp=sharing -- To view, visit http://gerrit.cloudera.org:8080/6013 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id497043ab33dcf107a562f0b1ccd5c46095d397f Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2020: Make it easy to work with big numbers
Jim Apple has posted comments on this change. Change subject: IMPALA-2020: Make it easy to work with big numbers .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/5902/4/be/src/util/decimal-util.cc File be/src/util/decimal-util.cc: Line 25: static constexpr T AllNines(unsigned precision) { If it's going to be this specific, why not const int128_t DecimalUtil::MAX_UNSCALED_DECIMAL16 = 99 + 100 * (MAX_UNSCALED_DECIMAL8 + (1 + MAX_UNSCALED_DECIMAL8) * static_cast(MAX_UNSCALED_DECIMAL8)); -- To view, visit http://gerrit.cloudera.org:8080/5902 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5095a366d914cebb0b64bd434a08dbb55c90ed30 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4885: Expose Jvm thread info in web UI
Henry Robinson has posted comments on this change. Change subject: IMPALA-4885: Expose Jvm thread info in web UI .. Patch Set 1: This is super useful. Could you post the screenshot somewhere that's publicly accessible? -- To view, visit http://gerrit.cloudera.org:8080/6013 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id497043ab33dcf107a562f0b1ccd5c46095d397f Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-HasComments: No
[Impala-ASF-CR](asf-site) A blog post about IMPALA-4916
Jim Apple has posted comments on this change. Change subject: A blog post about IMPALA-4916 .. Patch Set 3: > (1 comment) > > So, I believe that all of the pages still point to the old blog > site. This patch doesn't seem to address that just yet. Was that by > intention? It wasn't but now that I'm thinking about it, I'd say this might be the right option. Maybe we should get a few posts +2ed and ready to commit, as well as a sustainable cadence going, before posting. What do you think? -- To view, visit http://gerrit.cloudera.org:8080/5995 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ifa90bf5621ef6466a4821f77a6e8a8b20c5512ae Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: asf-site Gerrit-Owner: Jim AppleGerrit-Reviewer: Alex Behm Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Brown Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4916: Fix maintenance of set of item sets in DisjointSet.
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4916: Fix maintenance of set of item sets in DisjointSet. .. Patch Set 2: Code-Review+1 (2 comments) http://gerrit.cloudera.org:8080/#/c/5980/2/fe/src/main/java/org/apache/impala/util/DisjointSet.java File fe/src/main/java/org/apache/impala/util/DisjointSet.java: PS2, Line 49: private final IdentityHashMapuniqueSets_ = : new IdentityHashMap (); nice idea :) PS2, Line 111: checkState(removedValue == DUMMY_VALUE); I think checking for not NULL is also ok. -- To view, visit http://gerrit.cloudera.org:8080/5980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I609c8795c09becd78815605ea8e82e2f99e82212 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: Yes