[Impala-ASF-CR] IMPALA-10117: Skip calls to FsPermissionCache for blob stores
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16898 ) Change subject: IMPALA-10117: Skip calls to FsPermissionCache for blob stores .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7900/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16898 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2034695a956307309f656d56aa57aa07ae5163d8 Gerrit-Change-Number: 16898 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 23 Dec 2020 05:36:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10117: Skip calls to FsPermissionCache for blob stores
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16898 ) Change subject: IMPALA-10117: Skip calls to FsPermissionCache for blob stores .. Patch Set 1: I noticed the JIRA from Sahil and figured it was an easy fix to speed up table loading. -- To view, visit http://gerrit.cloudera.org:8080/16898 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2034695a956307309f656d56aa57aa07ae5163d8 Gerrit-Change-Number: 16898 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 23 Dec 2020 05:15:34 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10117: Skip calls to FsPermissionCache for blob stores
Tim Armstrong has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16898 Change subject: IMPALA-10117: Skip calls to FsPermissionCache for blob stores .. IMPALA-10117: Skip calls to FsPermissionCache for blob stores This avoids calling precacheChildrenOf() in cases when the cached values will never be used. This change simply skips calling precacheChildrenOf() in the cases when getPermissions() is never called. There is some opportunity to clean up this permissions checking further, but I decided to keep this fix limited in scope. Change-Id: I2034695a956307309f656d56aa57aa07ae5163d8 --- M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java 1 file changed, 27 insertions(+), 14 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/16898/1 -- To view, visit http://gerrit.cloudera.org:8080/16898 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I2034695a956307309f656d56aa57aa07ae5163d8 Gerrit-Change-Number: 16898 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10374: Limit iteration at BufferedTupleStream::DebugString
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16884 ) Change subject: IMPALA-10374: Limit iteration at BufferedTupleStream::DebugString .. Patch Set 5: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6803/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/16884 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6626c8d54f35f303c01f85be1dd9aa54c8ad9a2d Gerrit-Change-Number: 16884 Gerrit-PatchSet: 5 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 23 Dec 2020 05:11:13 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10374: Limit iteration at BufferedTupleStream::DebugString
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/16884 ) Change subject: IMPALA-10374: Limit iteration at BufferedTupleStream::DebugString .. Patch Set 5: Patch set 5 is a rebase on top of IMPALA-9550 fix. The flakiness should be gone now. -- To view, visit http://gerrit.cloudera.org:8080/16884 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6626c8d54f35f303c01f85be1dd9aa54c8ad9a2d Gerrit-Change-Number: 16884 Gerrit-PatchSet: 5 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 23 Dec 2020 01:33:54 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9550: Fix flakiness in TestResultSpoolingFetchSize.test fetch
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16895 ) Change subject: IMPALA-9550: Fix flakiness in TestResultSpoolingFetchSize.test_fetch .. IMPALA-9550: Fix flakiness in TestResultSpoolingFetchSize.test_fetch TestResultSpoolingFetchSize.test_fetch has been flaky in ubuntu-16.04-dockerised environment for not reaching finished state within 10 seconds. This patch increase the timeout of the test to 30 seconds. Testing: - Looped the test locally. Change-Id: Id2e8a9db904da5f1e4acc9e18b3987b8a4ec24e5 Reviewed-on: http://gerrit.cloudera.org:8080/16895 Reviewed-by: Bikramjeet Vig Tested-by: Impala Public Jenkins --- M tests/query_test/test_result_spooling.py 1 file changed, 1 insertion(+), 1 deletion(-) Approvals: Bikramjeet Vig: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/16895 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Id2e8a9db904da5f1e4acc9e18b3987b8a4ec24e5 Gerrit-Change-Number: 16895 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-9550: Fix flakiness in TestResultSpoolingFetchSize.test fetch
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16895 ) Change subject: IMPALA-9550: Fix flakiness in TestResultSpoolingFetchSize.test_fetch .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/16895 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id2e8a9db904da5f1e4acc9e18b3987b8a4ec24e5 Gerrit-Change-Number: 16895 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 23 Dec 2020 00:54:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16720 ) Change subject: IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate .. Patch Set 40: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7899/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16720 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691 Gerrit-Change-Number: 16720 Gerrit-PatchSet: 40 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 23 Dec 2020 00:30:31 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16720 ) Change subject: IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate .. Patch Set 40: (4 comments) http://gerrit.cloudera.org:8080/#/c/16720/40/fe/src/test/java/org/apache/impala/planner/PlannerTest.java File fe/src/test/java/org/apache/impala/planner/PlannerTest.java: http://gerrit.cloudera.org:8080/#/c/16720/40/fe/src/test/java/org/apache/impala/planner/PlannerTest.java@757 PS40, Line 757: options.setDisable_overlap_filter(true); // Required so that output doesn't vary by whether parquet tables are used or not. line too long (127 > 90) http://gerrit.cloudera.org:8080/#/c/16720/40/fe/src/test/java/org/apache/impala/planner/PlannerTest.java@787 PS40, Line 787: options.setDisable_overlap_filter(true); // Required so that output doesn't vary by the format of the table used. line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/16720/40/fe/src/test/java/org/apache/impala/planner/PlannerTest.java@787 PS40, Line 787: options.setDisable_overlap_filter(true); // Required so that output doesn't vary by the format of the table used. line too long (118 > 90) http://gerrit.cloudera.org:8080/#/c/16720/40/tests/run-tests.py File tests/run-tests.py: http://gerrit.cloudera.org:8080/#/c/16720/40/tests/run-tests.py@219 PS40, Line 219: % flake8: E131 continuation line unaligned for hanging indent -- To view, visit http://gerrit.cloudera.org:8080/16720 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691 Gerrit-Change-Number: 16720 Gerrit-PatchSet: 40 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 23 Dec 2020 00:08:25 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate
Qifan Chen has uploaded a new patch set (#40). ( http://gerrit.cloudera.org:8080/16720 ) Change subject: IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate .. IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate This patch adds a new class of predicates called overlap predicates to aid in the determination of whether a Parquet row group or a page overlap with a range computed from an equi hash join. If not, then the entire row group or page are skipped. An overlap predicate exists as a min/max filter. For the following query, the min and max in such a min/max filter are computed with the values from the join column from table 'b' and become fully available when the entire hash table is built. To evaluate the overlap predicate, these two values are compared against the min/max of each row group or page at the scan node for 'a'. select straight_join count(*) from lineitem_sorted_l_shipdate a join [SHUFFLE] lineitem_sorted_l_shipdate b where a.l_shipdate = b.l_receiptdate and b.l_commitdate = "1992-01-31"; An overlap predicate associated with the column type J (in hash table) and scan column type S will be formed when one of the following is true: Both J and S are booleans Both J and S are integers (tinyint, smallint, int, or bigint) Both J and S are approximate numeric (float or double) Both J and S are decimals with the same precision and scale Both J and S are strings (STRING, CHAR or VARCHAR) Both J and S are date Both J and S are timestamp Like any existing min/max filters, MAX_NUM_RUNTIME_FILTERS query option does not apply to min/max filters created for overlap predicates. The overlap predicates will always be evaluated, after the min/max conjuncts (if any). Two new run-time profile counters are added to report the number of row groups or pages filtered out via the overlap predicates respectively: 1. NumRuntimeFilteredRowGroups 2. NumRuntimeFilteredPages Testing: 1. Unit tested on various column types with TPCH and TPCDS tables. Benefits were significant when the join column on the outer table is sorted, or when the min/max boundary values of the pages or row groups are monotonic; 2. Added new tests in min_max_filters.test to demonstrate the number of filtered out pages and row groups with the two new profile counters; 2. Added new tests in runtime-filter-propagation.test to demonstrate that the overlap predicates work with different column types; 4. Added data type specific overlap method tests in min-max-filter-test.cc; 5. Core testing. TBD in this patch: 1. Performance measurement. To do in follow-up JIRAs: 1. Apply the overlap predicate on partition columns; 2. Apply the overlap predicate on each row; 3. IR code-gen for various MinMaxFilter::EvalOverlap methods. Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691 --- M be/src/exec/exec-node.h M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.h M be/src/exec/parquet/parquet-column-stats.cc M be/src/exec/parquet/parquet-column-stats.h M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/scan-node.cc M be/src/runtime/coordinator.cc M be/src/runtime/date-value.cc M be/src/runtime/date-value.h M be/src/runtime/runtime-filter-ir.cc M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/min-max-filter-test.cc M be/src/util/min-max-filter.cc M be/src/util/min-max-filter.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/main/java/org/apache/impala/analysis/Predicate.java M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/broadcast-bytes-limit-large.test M testdata/workloads/functional-planner/queries/PlannerTest/broadcast-bytes-limit.test A testdata/workloads/functional-planner/queries/PlannerTest/disable-runtime-overlap-filter.test M testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters-hdfs-num-rows-est-enabled.test M testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test M testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test M
[Impala-ASF-CR] IMPALA-9550: Fix flakiness in TestResultSpoolingFetchSize.test fetch
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16895 ) Change subject: IMPALA-9550: Fix flakiness in TestResultSpoolingFetchSize.test_fetch .. Patch Set 1: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6801/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/16895 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id2e8a9db904da5f1e4acc9e18b3987b8a4ec24e5 Gerrit-Change-Number: 16895 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Tue, 22 Dec 2020 19:17:15 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16720 ) Change subject: IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate .. Patch Set 39: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7898/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16720 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691 Gerrit-Change-Number: 16720 Gerrit-PatchSet: 39 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 22 Dec 2020 19:16:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9550: Fix flakiness in TestResultSpoolingFetchSize.test fetch
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/16895 ) Change subject: IMPALA-9550: Fix flakiness in TestResultSpoolingFetchSize.test_fetch .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16895 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id2e8a9db904da5f1e4acc9e18b3987b8a4ec24e5 Gerrit-Change-Number: 16895 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Tue, 22 Dec 2020 19:16:43 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16720 ) Change subject: IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate .. Patch Set 39: (1 comment) http://gerrit.cloudera.org:8080/#/c/16720/39/tests/run-tests.py File tests/run-tests.py: http://gerrit.cloudera.org:8080/#/c/16720/39/tests/run-tests.py@219 PS39, Line 219: % flake8: E131 continuation line unaligned for hanging indent -- To view, visit http://gerrit.cloudera.org:8080/16720 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691 Gerrit-Change-Number: 16720 Gerrit-PatchSet: 39 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 22 Dec 2020 18:55:47 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate
Qifan Chen has uploaded a new patch set (#39). ( http://gerrit.cloudera.org:8080/16720 ) Change subject: IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate .. IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate This patch adds a new class of predicates called overlap predicates to aid in the determination of whether a Parquet row group or a page overlap with a range computed from an equi hash join. If not, then the entire row group or page are skipped. An overlap predicate exists as a min/max filter. For the following query, the min and max in such a min/max filter are computed with the values from the join column from table 'b' and become fully available when the entire hash table is built. To evaluate the overlap predicate, these two values are compared against the min/max of each row group or page at the scan node for 'a'. select straight_join count(*) from lineitem_sorted_l_shipdate a join [SHUFFLE] lineitem_sorted_l_shipdate b where a.l_shipdate = b.l_receiptdate and b.l_commitdate = "1992-01-31"; An overlap predicate associated with the column type J (in hash table) and scan column type S will be formed when one of the following is true: Both J and S are booleans Both J and S are integers (tinyint, smallint, int, or bigint) Both J and S are approximate numeric (float or double) Both J and S are decimals with the same precision and scale Both J and S are strings (STRING, CHAR or VARCHAR) Both J and S are date Both J and S are timestamp Like any existing min/max filters, MAX_NUM_RUNTIME_FILTERS query option does not apply to min/max filters created for overlap predicates. The overlap predicates will always be evaluated, after the min/max conjuncts (if any). Two new run-time profile counters are added to report the number of row groups or pages filtered out via the overlap predicates respectively: 1. NumRuntimeFilteredRowGroups 2. NumRuntimeFilteredPages Testing: 1. Unit tested on various column types with TPCH and TPCDS tables. Benefits were significant when the join column on the outer table is sorted, or when the min/max boundary values of the pages or row groups are monotonic; 2. Added new tests in min_max_filters.test to demonstrate the number of filtered out pages and row groups with the two new profile counters; 2. Added new tests in runtime-filter-propagation.test to demonstrate that the overlap predicates work with different column types; 4. Added data type specific overlap method tests in min-max-filter-test.cc; 5. Core testing. TBD in this patch: 1. Performance measurement. To do in follow-up JIRAs: 1. Apply the overlap predicate on partition columns; 2. IR code-gen for various MinMaxFilter::EvalOverlap methods. Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691 --- M be/src/exec/exec-node.h M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.h M be/src/exec/parquet/parquet-column-stats.cc M be/src/exec/parquet/parquet-column-stats.h M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/scan-node.cc M be/src/runtime/coordinator.cc M be/src/runtime/date-value.cc M be/src/runtime/date-value.h M be/src/runtime/runtime-filter-ir.cc M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/min-max-filter-test.cc M be/src/util/min-max-filter.cc M be/src/util/min-max-filter.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/main/java/org/apache/impala/analysis/Predicate.java M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/disable-runtime-overlap-filter.test M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test M testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test M testdata/workloads/functional-query/queries/QueryTest/bloom_filters.test M testdata/workloads/functional-query/queries/QueryTest/min_max_filters.test M testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters.test M tests/run-tests.py 36 files changed, 1,869 insertions(+), 252 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/16720/39 -- To view, visit http://gerrit.cloudera.org:8080/16720 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project:
[Impala-ASF-CR] IMPALA-9975 (part 2): Introduce new admission control daemon
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16891 ) Change subject: IMPALA-9975 (part 2): Introduce new admission control daemon .. Patch Set 1: (3 comments) Nice refactoring. I had a couple of high-level questions. http://gerrit.cloudera.org:8080/#/c/16891/1/be/src/runtime/exec-env.cc File be/src/runtime/exec-env.cc: http://gerrit.cloudera.org:8080/#/c/16891/1/be/src/runtime/exec-env.cc@289 PS1, Line 289: if (FLAGS_is_coordinator && FLAGS_admission_control_service_addr.empty()) { IIRC the executors do actually need an AdmissionController to report the memory usage from the pool MemTrackers. Looking at the code there's a lot of indirection, but basically UpdateMemTrackerStats() gets the memory usage from the pool mem trackers and that gets added to the topic update. I think this is not as relevant as it once was, but it does play a role when we're running without memory limits or where there's significant untracked memory. So we want to preserve this behaviour at least in the standard config. We could consider disabling this when the admission control daemon is in use - it's less relevant when we don't need to deal with distributed admission control. http://gerrit.cloudera.org:8080/#/c/16891/1/be/src/scheduling/admissiond-main.cc File be/src/scheduling/admissiond-main.cc: http://gerrit.cloudera.org:8080/#/c/16891/1/be/src/scheduling/admissiond-main.cc@44 PS1, Line 44: ABORT_IF_ERROR(daemon_env.Init(/* init_jvm */ false)); Does the admission controller need to load the llama-site.xml and fair-scheduler.xml configs? Wondering if it might actually need a JVM for that purpose. http://gerrit.cloudera.org:8080/#/c/16891/1/bin/start-impala-cluster.py File bin/start-impala-cluster.py: http://gerrit.cloudera.org:8080/#/c/16891/1/bin/start-impala-cluster.py@413 PS1, Line 413: 127.0.0.1 Maybe should be INTERNAL_LISTEN_HOST. Not sure it matters though. -- To view, visit http://gerrit.cloudera.org:8080/16891 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id677814b31e9193035e8cf0d08aba0ce388a0ad9 Gerrit-Change-Number: 16891 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 22 Dec 2020 18:32:07 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6434: Add support to decode RLE DICTIONARY encoded pages
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16893 ) Change subject: IMPALA-6434: Add support to decode RLE_DICTIONARY encoded pages .. Patch Set 3: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/7897/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/16893 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I90942022edcd5d96c720a1bde53879e50394660a Gerrit-Change-Number: 16893 Gerrit-PatchSet: 3 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 22 Dec 2020 17:52:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6434: Add support to decode RLE DICTIONARY encoded pages
Tim Armstrong has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/16893 ) Change subject: IMPALA-6434: Add support to decode RLE_DICTIONARY encoded pages .. IMPALA-6434: Add support to decode RLE_DICTIONARY encoded pages This add the support to use this enum value instead of the old PLAIN/PLAIN_DICTIONARY values. A hidden option -use_new_parquet_dictionary_encodings is added to turn on writing too, for test purposes only. Testing: * Added an automated test using a pregenerated test file. * Ran core tests. * Manually tested by writing out TPC-H lineitem with the new encoding and reading back in Impala and Hive. Parquet-tools output for the generated test file: $ hadoop jar ~/repos/parquet-mr/parquet-tools/target/parquet-tools-1.12.0-SNAPSHOT.jar meta /test-warehouse/att/824de2afebad009f-6f460ade0003_643159826_data.0.parq 20/12/21 20:28:36 INFO hadoop.ParquetFileReader: Initiating action with parallelism: 5 20/12/21 20:28:36 INFO hadoop.ParquetFileReader: reading another 1 footers 20/12/21 20:28:36 INFO hadoop.ParquetFileReader: Initiating action with parallelism: 5 file: hdfs://localhost:20500/test-warehouse/att/824de2afebad009f-6f460ade0003_643159826_data.0.parq creator: impala version 4.0.0-SNAPSHOT (build 7b691c5d4249f0cb1ced8ddf01033fbbe10511d9) file schema: schema id: OPTIONAL INT32 L:INTEGER(32,true) R:0 D:1 bool_col:OPTIONAL BOOLEAN R:0 D:1 tinyint_col: OPTIONAL INT32 L:INTEGER(8,true) R:0 D:1 smallint_col:OPTIONAL INT32 L:INTEGER(16,true) R:0 D:1 int_col: OPTIONAL INT32 L:INTEGER(32,true) R:0 D:1 bigint_col: OPTIONAL INT64 L:INTEGER(64,true) R:0 D:1 float_col: OPTIONAL FLOAT R:0 D:1 double_col: OPTIONAL DOUBLE R:0 D:1 date_string_col: OPTIONAL BINARY R:0 D:1 string_col: OPTIONAL BINARY R:0 D:1 timestamp_col: OPTIONAL INT96 R:0 D:1 year:OPTIONAL INT32 L:INTEGER(32,true) R:0 D:1 month: OPTIONAL INT32 L:INTEGER(32,true) R:0 D:1 row group 1: RC:8 TS:754 OFFSET:4 id: INT32 SNAPPY DO:4 FPO:48 SZ:74/73/0.99 VC:8 ENC:RLE,RLE_DICTIONARY ST:[min: 0, max: 7, num_nulls: 0] bool_col: BOOLEAN SNAPPY DO:0 FPO:141 SZ:26/24/0.92 VC:8 ENC:RLE,PLAIN ST:[min: false, max: true, num_nulls: 0] tinyint_col: INT32 SNAPPY DO:220 FPO:243 SZ:51/47/0.92 VC:8 ENC:RLE,RLE_DICTIONARY ST:[min: 0, max: 1, num_nulls: 0] smallint_col: INT32 SNAPPY DO:343 FPO:366 SZ:51/47/0.92 VC:8 ENC:RLE,RLE_DICTIONARY ST:[min: 0, max: 1, num_nulls: 0] int_col: INT32 SNAPPY DO:467 FPO:490 SZ:51/47/0.92 VC:8 ENC:RLE,RLE_DICTIONARY ST:[min: 0, max: 1, num_nulls: 0] bigint_col: INT64 SNAPPY DO:586 FPO:617 SZ:59/55/0.93 VC:8 ENC:RLE,RLE_DICTIONARY ST:[min: 0, max: 10, num_nulls: 0] float_col:FLOAT SNAPPY DO:724 FPO:747 SZ:51/47/0.92 VC:8 ENC:RLE,RLE_DICTIONARY ST:[min: -0.0, max: 1.1, num_nulls: 0] double_col: DOUBLE SNAPPY DO:845 FPO:876 SZ:59/55/0.93 VC:8 ENC:RLE,RLE_DICTIONARY ST:[min: -0.0, max: 10.1, num_nulls: 0] date_string_col: BINARY SNAPPY DO:983 FPO:1028 SZ:74/88/1.19 VC:8 ENC:RLE,RLE_DICTIONARY ST:[min: 0x30312F30312F3039, max: 0x30342F30312F3039, num_nulls: 0] string_col: BINARY SNAPPY DO:1143 FPO:1168 SZ:53/49/0.92 VC:8 ENC:RLE,RLE_DICTIONARY ST:[min: 0x30, max: 0x31, num_nulls: 0] timestamp_col:INT96 SNAPPY DO:1261 FPO:1329 SZ:98/138/1.41 VC:8 ENC:RLE,RLE_DICTIONARY ST:[num_nulls: 0, min/max not defined] year: INT32 SNAPPY DO:1451 FPO:1470 SZ:47/43/0.91 VC:8 ENC:RLE,RLE_DICTIONARY ST:[min: 2009, max: 2009, num_nulls: 0] month:INT32 SNAPPY DO:1563 FPO:1594 SZ:60/56/0.93 VC:8 ENC:RLE,RLE_DICTIONARY ST:[min: 1, max: 4, num_nulls: 0] Parquet-tools output for one of the lineitem files: $ hadoop jar ~/repos/parquet-mr/parquet-tools/target/parquet-tools-1.12.0-SNAPSHOT.jar meta /test-warehouse/li2/4b4d9143c575dd71-3f69d3cf0001_1879643220_data.0.parq 20/12/22 09:39:56 INFO hadoop.ParquetFileReader: Initiating action with parallelism: 5 20/12/22 09:39:56 INFO hadoop.ParquetFileReader: reading another 1 footers 20/12/22 09:39:56 INFO hadoop.ParquetFileReader: Initiating action with parallelism: 5 file: hdfs://localhost:20500/test-warehouse/li2/4b4d9143c575dd71-3f69d3cf0001_1879643220_data.0.parq creator: impala version 4.0.0-SNAPSHOT (build 7b691c5d4249f0cb1ced8ddf01033fbbe10511d9) file schema: schema l_orderkey: OPTIONAL INT64 L:INTEGER(64,true) R:0 D:1 l_partkey: OPTIONAL INT64 L:INTEGER(64,true) R:0 D:1 l_suppkey: OPTIONAL INT64 L:INTEGER(64,true) R:0 D:1 l_linenumber:OPTIONAL INT32 L:INTEGER(32,true) R:0 D:1 l_quantity:
[Impala-ASF-CR] IMPALA-9550: Fix flakiness in TestResultSpoolingFetchSize.test fetch
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16895 ) Change subject: IMPALA-9550: Fix flakiness in TestResultSpoolingFetchSize.test_fetch .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7896/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16895 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id2e8a9db904da5f1e4acc9e18b3987b8a4ec24e5 Gerrit-Change-Number: 16895 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 22 Dec 2020 17:20:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9550: Fix flakiness in TestResultSpoolingFetchSize.test fetch
Riza Suminto has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16895 Change subject: IMPALA-9550: Fix flakiness in TestResultSpoolingFetchSize.test_fetch .. IMPALA-9550: Fix flakiness in TestResultSpoolingFetchSize.test_fetch TestResultSpoolingFetchSize.test_fetch has been flaky in ubuntu-16.04-dockerised environment for not reaching finished state within 10 seconds. This patch increase the timeout of the test to 30 seconds. Testing: - Looped the test locally. Change-Id: Id2e8a9db904da5f1e4acc9e18b3987b8a4ec24e5 --- M tests/query_test/test_result_spooling.py 1 file changed, 1 insertion(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/16895/1 -- To view, visit http://gerrit.cloudera.org:8080/16895 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Id2e8a9db904da5f1e4acc9e18b3987b8a4ec24e5 Gerrit-Change-Number: 16895 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto
[Impala-ASF-CR] IMPALA-10374: Limit iteration at BufferedTupleStream::DebugString
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/16884 ) Change subject: IMPALA-10374: Limit iteration at BufferedTupleStream::DebugString .. Patch Set 4: > Patch Set 4: Verified-1 > > Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6800/ Hit by IMPALA-9550. Ironically, I just close IMPALA-9550 yesterday as not reproducible. I will reopen the JIRA again. -- To view, visit http://gerrit.cloudera.org:8080/16884 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6626c8d54f35f303c01f85be1dd9aa54c8ad9a2d Gerrit-Change-Number: 16884 Gerrit-PatchSet: 4 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Tue, 22 Dec 2020 15:46:14 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10374: Limit iteration at BufferedTupleStream::DebugString
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16884 ) Change subject: IMPALA-10374: Limit iteration at BufferedTupleStream::DebugString .. Patch Set 4: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6800/ -- To view, visit http://gerrit.cloudera.org:8080/16884 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6626c8d54f35f303c01f85be1dd9aa54c8ad9a2d Gerrit-Change-Number: 16884 Gerrit-PatchSet: 4 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Tue, 22 Dec 2020 10:45:30 + Gerrit-HasComments: No