[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. IMPALA-10650: Bailout min/max filters in hash join builder early This change set addresses the weakness in population min/max filters in the hash join builder by periodically measuring the usefulness of each filter and set the 'always_true_' flag accordingly. Once set to true, the insertion to such a filter completely skips the steps from the evaluation of the value from a row to the verification of the value in the min/max range. This optimization is LLVM-enabled. In addition, a new flag 'is_min_max_value_present' is added to TRuntimeFilterTargetDesc to indicate whether the min/max column stats is present in the query plan. The flag eliminates the need to check the presence of min/max stats for every row in back-end. Early bail out improves the HJ builder step in general. For example, the step for join node #11 in TPCDS Q8 improves 13%, and the step for join node #8 in TPCDS Q16 improves 3.2%. The Insert() methods are optimized with branch prediction compiler hints which yield the following improvement when tested with the insertion of 1 randomly generated items. Small Integers: 7.0% Integers: 4.1% Big Integers: 4.3% Strings:5.6% Dates: 4.4% Timestamps:10.7% Decimals(4): 10.4% Decimals(8):9.1% In addition, the min/max stats for pages are read in batches with a fast track version for column types of int32_t, int64_t, float, double and date that have identical storage format as Parquet. For a row group, the page locations are read only once, instead of once for every page skipped, resulting in 100x speedup when a subset of 199 pages are skipped. Testing: 1. Ran core test successfully; 2. Ran TPCDS performance tests. Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Reviewed-on: http://gerrit.cloudera.org:8080/17295 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M be/src/codegen/gen_ir_descriptions.py M be/src/exec/filter-context.cc M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.h M be/src/exec/parquet/parquet-column-stats.cc M be/src/exec/parquet/parquet-column-stats.h M be/src/exec/parquet/parquet-column-stats.inline.h M be/src/exec/parquet/parquet-common.h M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/runtime/runtime-filter-ir.cc M be/src/util/min-max-filter-ir.cc M be/src/util/min-max-filter.cc M be/src/util/min-max-filter.h M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/util/TColumnValueUtil.java 17 files changed, 984 insertions(+), 297 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 34 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. Patch Set 33: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 33 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 25 May 2021 14:02:25 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. Patch Set 33: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7172/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 33 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 25 May 2021 08:08:41 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. Patch Set 33: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7170/ -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 33 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 24 May 2021 20:13:03 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. Patch Set 33: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7170/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 33 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 24 May 2021 14:15:19 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. Patch Set 33: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 33 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 24 May 2021 14:15:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. Patch Set 32: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 32 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 24 May 2021 14:14:45 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. Patch Set 32: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8768/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 32 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 21 May 2021 18:55:32 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. Patch Set 32: Here is the clean sql execution after the fix. Server version: impalad version 4.0.0-SNAPSHOT DEBUG (build 603d1a8fd0946b32e76807bb4c505171f87f7881) Query: drop table if exists min_max_filter_large_strings1 +-+ | summary | +-+ | Table has been dropped. | +-+ Fetched 1 row(s) in 4.94s Query: drop table if exists min_max_filter_large_strings2 +-+ | summary | +-+ | Table has been dropped. | +-+ Fetched 1 row(s) in 3.99s Query: create table min_max_filter_large_strings1 (string_col string primary key) stored as kudu +-+ | summary | +-+ | Table has been created. | +-+ WARNINGS: Unpartitioned Kudu tables are inefficient for large data sizes. Fetched 1 row(s) in 0.56s Query: insert into min_max_filter_large_strings1 values (''), ('bbbc'), (''),
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Qifan Chen has uploaded a new patch set (#32). ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. IMPALA-10650: Bailout min/max filters in hash join builder early This change set addresses the weakness in population min/max filters in the hash join builder by periodically measuring the usefulness of each filter and set the 'always_true_' flag accordingly. Once set to true, the insertion to such a filter completely skips the steps from the evaluation of the value from a row to the verification of the value in the min/max range. This optimization is LLVM-enabled. In addition, a new flag 'is_min_max_value_present' is added to TRuntimeFilterTargetDesc to indicate whether the min/max column stats is present in the query plan. The flag eliminates the need to check the presence of min/max stats for every row in back-end. Early bail out improves the HJ builder step in general. For example, the step for join node #11 in TPCDS Q8 improves 13%, and the step for join node #8 in TPCDS Q16 improves 3.2%. The Insert() methods are optimized with branch prediction compiler hints which yield the following improvement when tested with the insertion of 1 randomly generated items. Small Integers: 7.0% Integers: 4.1% Big Integers: 4.3% Strings:5.6% Dates: 4.4% Timestamps:10.7% Decimals(4): 10.4% Decimals(8):9.1% In addition, the min/max stats for pages are read in batches with a fast track version for column types of int32_t, int64_t, float, double and date that have identical storage format as Parquet. For a row group, the page locations are read only once, instead of once for every page skipped, resulting in 100x speedup when a subset of 199 pages are skipped. Testing: 1. Ran core test successfully; 2. Ran TPCDS performance tests. Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 --- M be/src/codegen/gen_ir_descriptions.py M be/src/exec/filter-context.cc M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.h M be/src/exec/parquet/parquet-column-stats.cc M be/src/exec/parquet/parquet-column-stats.h M be/src/exec/parquet/parquet-column-stats.inline.h M be/src/exec/parquet/parquet-common.h M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/runtime/runtime-filter-ir.cc M be/src/util/min-max-filter-ir.cc M be/src/util/min-max-filter.cc M be/src/util/min-max-filter.h M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/util/TColumnValueUtil.java 17 files changed, 984 insertions(+), 297 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/17295/32 -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 32 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. Patch Set 31: Seems like TestMinMaxFilters.test_large_strings timed out: https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/13947/testReport/ https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4238/testReport/ -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 31 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 21 May 2021 08:59:57 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. Patch Set 31: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7168/ -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 31 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 20 May 2021 22:22:54 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. Patch Set 31: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 31 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 20 May 2021 15:40:41 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. Patch Set 31: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7168/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 31 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 20 May 2021 15:40:42 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. Patch Set 30: Code-Review+2 (1 comment) Looks great, thanks for doing this improvement! http://gerrit.cloudera.org:8080/#/c/17295/30/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/17295/30/be/src/exec/parquet/hdfs-parquet-scanner.cc@1170 PS30, Line 1170: // Try to detect the longest span of non-null pages and batch read min/max stats for : // it. If a null-page is found, skip it right away and continue. Btw this is really nice! -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 30 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 20 May 2021 15:40:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. Patch Set 30: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8760/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 30 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 20 May 2021 15:33:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Qifan Chen has uploaded a new patch set (#30). ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. IMPALA-10650: Bailout min/max filters in hash join builder early This change set addresses the weakness in population min/max filters in the hash join builder by periodically measuring the usefulness of each filter and set the 'always_true_' flag accordingly. Once set to true, the insertion to such a filter completely skips the steps from the evaluation of the value from a row to the verification of the value in the min/max range. This optimization is LLVM-enabled. In addition, a new flag 'is_min_max_value_present' is added to TRuntimeFilterTargetDesc to indicate whether the min/max column stats is present in the query plan. The flag eliminates the need to check the presence of min/max stats for every row in back-end. Early bail out improves the HJ builder step in general. For example, the step for join node #11 in TPCDS Q8 improves 13%, and the step for join node #8 in TPCDS Q16 improves 3.2%. The Insert() methods are optimized with branch prediction compiler hints which yield the following improvement when tested with the insertion of 1 randomly generated items. Small Integers: 7.0% Integers: 4.1% Big Integers: 4.3% Strings:5.6% Dates: 4.4% Timestamps:10.7% Decimals(4): 10.4% Decimals(8):9.1% In addition, the min/max stats for pages are read in batches with a fast track version for column types of int32_t, int64_t, float, double and date that have identical storage format as Parquet. For a row group, the page locations are read only once, instead of once for every page skipped, resulting in 100x speedup when a subset of 199 pages are skipped. Testing: 1. Ran core test successfully; 2. Ran TPCDS performance tests. Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 --- M be/src/codegen/gen_ir_descriptions.py M be/src/exec/filter-context.cc M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.h M be/src/exec/parquet/parquet-column-stats.cc M be/src/exec/parquet/parquet-column-stats.h M be/src/exec/parquet/parquet-column-stats.inline.h M be/src/exec/parquet/parquet-common.h M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/runtime/runtime-filter-ir.cc M be/src/util/min-max-filter-ir.cc M be/src/util/min-max-filter.cc M be/src/util/min-max-filter.h M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/util/TColumnValueUtil.java 17 files changed, 983 insertions(+), 297 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/17295/30 -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 30 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. Patch Set 29: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8746/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 29 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 19 May 2021 00:18:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Qifan Chen has uploaded a new patch set (#29). ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. IMPALA-10650: Bailout min/max filters in hash join builder early This change set addresses the weakness in population min/max filters in the hash join builder by periodically measuring the usefulness of each filter and set the 'always_true_' flag accordingly. Once set to true, the insertion to such a filter completely skips the steps from the evaluation of the value from a row to the verification of the value in the min/max range. This optimization is LLVM-enabled. In addition, a new flag 'is_min_max_value_present' is added to TRuntimeFilterTargetDesc to indicate whether the min/max column stats is present in the query plan. The flag eliminates the need to check the presence of min/max stats for every row in back-end. Early bail out improves the HJ builder step in general. For example, the step for join node #11 in TPCDS Q8 improves 13%, and the step for join node #8 in TPCDS Q16 improves 3.2%. The Insert() methods are optimized with branch prediction compiler hints which yield the following improvement when tested with the insertion of 1 randomly generated items. Small Integers: 7.0% Integers: 4.1% Big Integers: 4.3% Strings:5.6% Dates: 4.4% Timestamps:10.7% Decimals(4): 10.4% Decimals(8):9.1% In addition, the min/max stats for pages are read in batches with a fast track version for column types of int32_t, int64_t, float, double and date that have identical storage format as Parquet. For a row group, the page locations are read only once, instead of once for every page skipped, resulting in 100x speedup when a subset of 199 pages are skipped. Testing: 1. Ran core test successfully; 2. Ran TPCDS performance tests. Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 --- M be/src/codegen/gen_ir_descriptions.py M be/src/exec/filter-context.cc M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.h M be/src/exec/parquet/parquet-column-stats.cc M be/src/exec/parquet/parquet-column-stats.h M be/src/exec/parquet/parquet-column-stats.inline.h M be/src/exec/parquet/parquet-common.h M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/runtime/runtime-filter-ir.cc M be/src/util/min-max-filter-ir.cc M be/src/util/min-max-filter.cc M be/src/util/min-max-filter.h M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/util/TColumnValueUtil.java 17 files changed, 979 insertions(+), 297 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/17295/29 -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 29 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. Patch Set 28: (1 comment) Address Riza's comment and fix one FE null ptr exception. http://gerrit.cloudera.org:8080/#/c/17295/28/be/src/exec/partitioned-hash-join-builder.cc File be/src/exec/partitioned-hash-join-builder.cc: http://gerrit.cloudera.org:8080/#/c/17295/28/be/src/exec/partitioned-hash-join-builder.cc@335 PS28, Line 335: if (filter_ctxs_.size() == 0) return; > Looks like we can remove this branch? Done -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 28 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 18 May 2021 23:56:54 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. Patch Set 28: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8744/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 28 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 18 May 2021 18:11:41 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. Patch Set 28: (3 comments) http://gerrit.cloudera.org:8080/#/c/17295/27/be/src/exec/partitioned-hash-join-builder.cc File be/src/exec/partitioned-hash-join-builder.cc: http://gerrit.cloudera.org:8080/#/c/17295/27/be/src/exec/partitioned-hash-join-builder.cc@337 PS27, Line 337: for (std::vector::const_iterator it = minmax_filter_ctxs_.begin(); > Sounds like a good idea. Looks good, thanks! http://gerrit.cloudera.org:8080/#/c/17295/27/be/src/exec/partitioned-hash-join-builder.cc@404 PS27, Line 404: } > Partial filter publishing with propagation to scan nodes may be a little bi Make sense. I suppose execution nodes consuming the minmax filters also still need to wait for the remaining filters to arrive before start reading. http://gerrit.cloudera.org:8080/#/c/17295/28/be/src/exec/partitioned-hash-join-builder.cc File be/src/exec/partitioned-hash-join-builder.cc: http://gerrit.cloudera.org:8080/#/c/17295/28/be/src/exec/partitioned-hash-join-builder.cc@335 PS28, Line 335: if (filter_ctxs_.size() == 0) return; Looks like we can remove this branch? In case minmax_filter_ctxs_ is empty, the loop below will stop immediately. -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 28 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 18 May 2021 18:09:21 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Qifan Chen has uploaded a new patch set (#28). ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. IMPALA-10650: Bailout min/max filters in hash join builder early This change set addresses the weakness in population min/max filters in the hash join builder by periodically measuring the usefulness of each filter and set the 'always_true_' flag accordingly. Once set to true, the insertion to such a filter completely skips the steps from the evaluation of the value from a row to the verification of the value in the min/max range. This optimization is LLVM-enabled. In addition, a new flag 'is_min_max_value_present' is added to TRuntimeFilterTargetDesc to indicate whether the min/max column stats is present in the query plan. The flag eliminates the need to check the presence of min/max stats for every row in back-end. Early bail out improves the HJ builder step in general. For example, the step for join node #11 in TPCDS Q8 improves 13%, and the step for join node #8 in TPCDS Q16 improves 3.2%. The Insert() methods are optimized with branch prediction compiler hints which yield the following improvement when tested with the insertion of 1 randomly generated items. Small Integers: 7.0% Integers: 4.1% Big Integers: 4.3% Strings:5.6% Dates: 4.4% Timestamps:10.7% Decimals(4): 10.4% Decimals(8):9.1% In addition, the min/max stats for pages are read in batches with a fast track version for column types of int32_t, int64_t, float, double and date that have identical storage format as Parquet. For a row group, the page locations are read only once, instead of once for every page skipped, resulting in 100x speedup when a subset of 199 pages are skipped. Testing: 1. Ran core test; 2. Ran performance test (TBD). Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 --- M be/src/codegen/gen_ir_descriptions.py M be/src/exec/filter-context.cc M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.h M be/src/exec/parquet/parquet-column-stats.cc M be/src/exec/parquet/parquet-column-stats.h M be/src/exec/parquet/parquet-column-stats.inline.h M be/src/exec/parquet/parquet-common.h M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/runtime/runtime-filter-ir.cc M be/src/util/min-max-filter-ir.cc M be/src/util/min-max-filter.cc M be/src/util/min-max-filter.h M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/util/TColumnValueUtil.java 17 files changed, 977 insertions(+), 297 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/17295/28 -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 28 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. Patch Set 27: (14 comments) Answer Riza and Zoltan's review comments. http://gerrit.cloudera.org:8080/#/c/17295/27/be/src/exec/filter-context.h File be/src/exec/filter-context.h: http://gerrit.cloudera.org:8080/#/c/17295/27/be/src/exec/filter-context.h@155 PS27, Line 155: static bool ShouldRejectFilterBasedOnColumnStats( > nit: I'm OK with reformatting, but I'm not sure if it was intended Done http://gerrit.cloudera.org:8080/#/c/17295/27/be/src/exec/filter-context.cc File be/src/exec/filter-context.cc: http://gerrit.cloudera.org:8080/#/c/17295/27/be/src/exec/filter-context.cc@231 PS27, Line 231: example > Could you please update this example? Done http://gerrit.cloudera.org:8080/#/c/17295/27/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/17295/27/be/src/exec/parquet/hdfs-parquet-scanner.cc@984 PS27, Line 984: scalar_reader->offset_index_ > nit: simply 'offset_index' from at L978? Good catch! Done http://gerrit.cloudera.org:8080/#/c/17295/27/be/src/exec/parquet/hdfs-parquet-scanner.cc@997 PS27, Line 997: DCHECK > nit: DCHECK_GE could be used. It has the advantage that in case of failure Done http://gerrit.cloudera.org:8080/#/c/17295/27/be/src/exec/parquet/hdfs-parquet-scanner.cc@1010 PS27, Line 1010: Expected > Any reason why we don't want the error message to be logged here? Done http://gerrit.cloudera.org:8080/#/c/17295/27/be/src/exec/parquet/hdfs-parquet-scanner.cc@1107 PS27, Line 1107: > nit: probably unintended space Done http://gerrit.cloudera.org:8080/#/c/17295/27/be/src/exec/parquet/parquet-column-stats.cc File be/src/exec/parquet/parquet-column-stats.cc: http://gerrit.cloudera.org:8080/#/c/17295/27/be/src/exec/parquet/parquet-column-stats.cc@281 PS27, Line 281: const int remainder = num_values % batch; > nit: do we need to calculate remainder? In the second for-loop we could hav Done http://gerrit.cloudera.org:8080/#/c/17295/27/be/src/exec/parquet/parquet-column-stats.cc@321 PS27, Line 321: const int remainder = num_values % batch; > nit: same as above Done http://gerrit.cloudera.org:8080/#/c/17295/27/be/src/exec/parquet/parquet-column-stats.inline.h File be/src/exec/parquet/parquet-column-stats.inline.h: http://gerrit.cloudera.org:8080/#/c/17295/27/be/src/exec/parquet/parquet-column-stats.inline.h@143 PS27, Line 143: DCHECK(buffer.size() == sizeof(int32_t)); : DCHECK(parquet_type == parquet::Type::INT32); > nit: DCHECK_EQ Done http://gerrit.cloudera.org:8080/#/c/17295/27/be/src/exec/parquet/parquet-column-stats.inline.h@159 PS27, Line 159: DCHECK(buffer.size() == sizeof(int32_t)); : DCHECK(parquet_type == parquet::Type::INT32); > nit: DCHECK_EQ Done http://gerrit.cloudera.org:8080/#/c/17295/27/be/src/exec/partitioned-hash-join-builder.cc File be/src/exec/partitioned-hash-join-builder.cc: http://gerrit.cloudera.org:8080/#/c/17295/27/be/src/exec/partitioned-hash-join-builder.cc@337 PS27, Line 337: for (const FilterContext& ctx : filter_ctxs_) { > I wonder if we can speed this up by iterating ONLY the minmax filters. Sounds like a good idea. A new vector minmax_filter_ctxs_ is added to cache the local min max filter contexts. An element from it is removed if the element is set to AlwaysTrue. The element will not be bothered with overlap check again. http://gerrit.cloudera.org:8080/#/c/17295/27/be/src/exec/partitioned-hash-join-builder.cc@345 PS27, Line 345: not_useful = false; > nit: I think it'd be a bit more readable if we decrease the negations, i.e. Done http://gerrit.cloudera.org:8080/#/c/17295/27/be/src/exec/partitioned-hash-join-builder.cc@404 PS27, Line 404: PublishRuntimeFilters(num_build_rows); > It seems to me that PublishRuntimeFilters is only called here in FinalizeBu Partial filter publishing with propagation to scan nodes may be a little bit complicated since it involves network traffic and context management. See PhjBuilder::FinalizeBuild(). With the work optimizing the insertion to an already disabled filter, and the work to only iterate over enabled filters for overlap checking, it looks like we can live with the current publishing strategy. http://gerrit.cloudera.org:8080/#/c/17295/27/be/src/util/min-max-filter-ir.cc File be/src/util/min-max-filter-ir.cc: http://gerrit.cloudera.org:8080/#/c/17295/27/be/src/util/min-max-filter-ir.cc@114 PS27, Line 114: predicion > nit: prediction Done -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. Patch Set 27: (12 comments) Found a few nits, but looks good overall. http://gerrit.cloudera.org:8080/#/c/17295/27/be/src/exec/filter-context.h File be/src/exec/filter-context.h: http://gerrit.cloudera.org:8080/#/c/17295/27/be/src/exec/filter-context.h@155 PS27, Line 155: static bool ShouldRejectFilterBasedOnColumnStats( nit: I'm OK with reformatting, but I'm not sure if it was intended http://gerrit.cloudera.org:8080/#/c/17295/27/be/src/exec/filter-context.cc File be/src/exec/filter-context.cc: http://gerrit.cloudera.org:8080/#/c/17295/27/be/src/exec/filter-context.cc@231 PS27, Line 231: example Could you please update this example? http://gerrit.cloudera.org:8080/#/c/17295/27/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/17295/27/be/src/exec/parquet/hdfs-parquet-scanner.cc@984 PS27, Line 984: scalar_reader->offset_index_ nit: simply 'offset_index' from at L978? http://gerrit.cloudera.org:8080/#/c/17295/27/be/src/exec/parquet/hdfs-parquet-scanner.cc@997 PS27, Line 997: DCHECK nit: DCHECK_GE could be used. It has the advantage that in case of failure it prints the actual values. http://gerrit.cloudera.org:8080/#/c/17295/27/be/src/exec/parquet/hdfs-parquet-scanner.cc@1010 PS27, Line 1010: Expected Any reason why we don't want the error message to be logged here? http://gerrit.cloudera.org:8080/#/c/17295/27/be/src/exec/parquet/hdfs-parquet-scanner.cc@1107 PS27, Line 1107: nit: probably unintended space http://gerrit.cloudera.org:8080/#/c/17295/27/be/src/exec/parquet/parquet-column-stats.cc File be/src/exec/parquet/parquet-column-stats.cc: http://gerrit.cloudera.org:8080/#/c/17295/27/be/src/exec/parquet/parquet-column-stats.cc@281 PS27, Line 281: const int remainder = num_values % batch; nit: do we need to calculate remainder? In the second for-loop we could have for (int i = pos; i < num_values; ++i) Or, use pos itself: for (; pos < num_values; ++pos) http://gerrit.cloudera.org:8080/#/c/17295/27/be/src/exec/parquet/parquet-column-stats.cc@321 PS27, Line 321: const int remainder = num_values % batch; nit: same as above http://gerrit.cloudera.org:8080/#/c/17295/27/be/src/exec/parquet/parquet-column-stats.inline.h File be/src/exec/parquet/parquet-column-stats.inline.h: http://gerrit.cloudera.org:8080/#/c/17295/27/be/src/exec/parquet/parquet-column-stats.inline.h@143 PS27, Line 143: DCHECK(buffer.size() == sizeof(int32_t)); : DCHECK(parquet_type == parquet::Type::INT32); nit: DCHECK_EQ http://gerrit.cloudera.org:8080/#/c/17295/27/be/src/exec/parquet/parquet-column-stats.inline.h@159 PS27, Line 159: DCHECK(buffer.size() == sizeof(int32_t)); : DCHECK(parquet_type == parquet::Type::INT32); nit: DCHECK_EQ http://gerrit.cloudera.org:8080/#/c/17295/27/be/src/exec/partitioned-hash-join-builder.cc File be/src/exec/partitioned-hash-join-builder.cc: http://gerrit.cloudera.org:8080/#/c/17295/27/be/src/exec/partitioned-hash-join-builder.cc@345 PS27, Line 345: not_useful = false; nit: I think it'd be a bit more readable if we decrease the negations, i.e. only call the variable 'useful'. http://gerrit.cloudera.org:8080/#/c/17295/27/be/src/util/min-max-filter-ir.cc File be/src/util/min-max-filter-ir.cc: http://gerrit.cloudera.org:8080/#/c/17295/27/be/src/util/min-max-filter-ir.cc@114 PS27, Line 114: predicion nit: prediction -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 27 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 18 May 2021 15:23:17 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. Patch Set 27: (2 comments) Hi Qifan, I wonder if we can improve the minmax filter performance from the build side. I have the following questions and comments. http://gerrit.cloudera.org:8080/#/c/17295/27/be/src/exec/partitioned-hash-join-builder.cc File be/src/exec/partitioned-hash-join-builder.cc: http://gerrit.cloudera.org:8080/#/c/17295/27/be/src/exec/partitioned-hash-join-builder.cc@337 PS27, Line 337: for (const FilterContext& ctx : filter_ctxs_) { I wonder if we can speed this up by iterating ONLY the minmax filters. Maybe copy reference of the minmax filters into separate vector? This function seems to be called frequently on every PhjBuilder::AddBatch. I imagine if minmax filter is enabled, only half of filter_ctxs_ elements are actually minmax filter. We can also pop filter out of the vector once it deemed not useful, therefore speeding up the next iteration. http://gerrit.cloudera.org:8080/#/c/17295/27/be/src/exec/partitioned-hash-join-builder.cc@404 PS27, Line 404: PublishRuntimeFilters(num_build_rows); It seems to me that PublishRuntimeFilters is only called here in FinalizeBuild (I assume near the end of the build process). Since minmax filter can be quickly disabled after reading few early RowBatch, shall we consider to publish them as soon as possible? Say, immediately publish disabled minmax filter from PhjBuilder::DetermineUsefulnessForMinmaxFilters()? -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 27 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 18 May 2021 02:52:50 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. Patch Set 27: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8739/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 27 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 18 May 2021 01:41:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Qifan Chen has uploaded a new patch set (#27). ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. IMPALA-10650: Bailout min/max filters in hash join builder early This change set addresses the weakness in population min/max filters in the hash join builder by periodically measuring the usefulness of each filter and set the 'always_true_' flag accordingly. Once set to true, the insertion to such a filter completely skips the steps from the evaluation of the value from a row to the verification of the value in the min/max range. This optimization is LLVM-enabled. In addition, a new flag 'is_min_max_value_present' is added to TRuntimeFilterTargetDesc to indicate whether the min/max column stats is present in the query plan. The flag eliminates the need to check the presence of min/max stats for every row in back-end. Early bail out improves the HJ builder step in general. For example, the step for join node #11 in TPCDS Q8 improves 13%, and the step for join node #8 in TPCDS Q16 improves 3.2%. The Insert() methods are optimized with branch prediction compiler hints which yield the following improvement when tested with the insertion of 1 randomly generated items. Small Integers: 7.0% Integers: 4.1% Big Integers: 4.3% Strings:5.6% Dates: 4.4% Timestamps:10.7% Decimals(4): 10.4% Decimals(8):9.1% In addition, the min/max stats for pages are read in batches with a fast track version for column types of int32_t, int64_t, float, double and date that have identical storage format as Parquet. For a row group, the page locations are read only once, instead of once for every page skipped, resulting in 100x speedup when a subset of 199 pages are skipped. Testing: 1. Ran core test; 2. Ran performance test (TBD). Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 --- M be/src/codegen/gen_ir_descriptions.py M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.h M be/src/exec/parquet/parquet-column-stats.cc M be/src/exec/parquet/parquet-column-stats.h M be/src/exec/parquet/parquet-column-stats.inline.h M be/src/exec/parquet/parquet-common.h M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/runtime/runtime-filter-ir.cc M be/src/util/min-max-filter-ir.cc M be/src/util/min-max-filter.cc M be/src/util/min-max-filter.h M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/util/TColumnValueUtil.java 18 files changed, 871 insertions(+), 259 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/17295/27 -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 27 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. Patch Set 26: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8737/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 26 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 17 May 2021 20:50:12 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Qifan Chen has uploaded a new patch set (#26). ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. IMPALA-10650: Bailout min/max filters in hash join builder early This change set addresses the weakness in population min/max filters in the hash join builder by periodically measuring the usefulness of each filter and set the 'always_true_' flag accordingly. Once set to true, the insertion to such a filter completely skips the steps from the evaluation of the value from a row to the verification of the value in the min/max range. This optimization is LLVM-enabled. In addition, a new flag 'is_min_max_value_present' is added to TRuntimeFilterTargetDesc to indicate whether the min/max column stats is present in the query plan. The flag eliminates the need to check the presence of min/max stats for every row in back-end. Early bail out improves the HJ builder step in general. For example, the step for join node #11 in TPCDS Q8 improves 13%, and the step for join node #8 in TPCDS Q16 improves 3.2%. The Insert() methods are optimized with branch prediction compiler hints which yield the following improvement when tested with the insertion of 1 randomly generated items. Small Integers: 7.0% Integers: 4.1% Big Integers: 4.3% Strings:5.6% Dates: 4.4% Timestamps:10.7% Decimals(4): 10.4% Decimals(8):9.1% In addition, the min/max stats for pages are read in batches with a fast track version for column types of int32_t, int64_t, float, double and date that have identical storage format as Parquet. For a row group, the page locations are read only once, instead of once for every page skipped. Testing: 1. Ran core test; 2. Ran performance test (TBD). Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 --- M be/src/codegen/gen_ir_descriptions.py M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.h M be/src/exec/parquet/parquet-column-stats.cc M be/src/exec/parquet/parquet-column-stats.h M be/src/exec/parquet/parquet-column-stats.inline.h M be/src/exec/parquet/parquet-common.h M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/runtime/runtime-filter-ir.cc M be/src/util/min-max-filter-ir.cc M be/src/util/min-max-filter.cc M be/src/util/min-max-filter.h M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/util/TColumnValueUtil.java 18 files changed, 875 insertions(+), 259 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/17295/26 -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 26 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. Patch Set 25: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8719/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 25 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 14 May 2021 13:38:36 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Qifan Chen has uploaded a new patch set (#25). ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. IMPALA-10650: Bailout min/max filters in hash join builder early This change set addresses the weakness in population min/max filters in the hash join builder by periodically measuring the usefulness of each filter and set the 'always_true_' flag accordingly. Once set to true, the insertion to such a filter completely skips the steps from the evaluation of the value from a row to the verification of the value in the min/max range. This optimization is LLVM-enabled. In addition, a new flag 'is_min_max_value_present' is added to TRuntimeFilterTargetDesc to indicate whether the min/max column stats is present in the query plan. The flag eliminates the need to check the presence of min/max stats for every row in back-end. Early bail out improves the HJ builder step in general. For example, the step for join node #11 in TPCDS Q8 improves 13%, and the step for join node #8 in TPCDS Q16 improves 3.2%. The Insert() methods are optimized with branch predication compiler hints which yield the following improvement when tested with the insertion of 1 randomly generated items. Small Integers: 7.0% Integers: 4.1% Big Integers: 4.3% Strings:5.6% Dates: 4.4% Timestamps:10.7% Decimals(4): 10.4% Decimals(8):9.1% The min/max stats for pages are read in batches with the fast track version for column types that have identical storage format as Parquet. Testing: 1. Ran core test; 2. Ran performance test (TBD). Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 --- M be/src/codegen/gen_ir_descriptions.py M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.h M be/src/exec/parquet/parquet-column-stats.cc M be/src/exec/parquet/parquet-column-stats.h M be/src/exec/parquet/parquet-column-stats.inline.h M be/src/exec/parquet/parquet-common.h M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/runtime/runtime-filter-ir.cc M be/src/util/min-max-filter-ir.cc M be/src/util/min-max-filter.cc M be/src/util/min-max-filter.h M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/util/TColumnValueUtil.java 18 files changed, 866 insertions(+), 256 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/17295/25 -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 25 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. Patch Set 24: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/8715/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 24 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 13 May 2021 19:09:42 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. Patch Set 23: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/8714/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 23 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 13 May 2021 18:51:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Qifan Chen has uploaded a new patch set (#24). ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. IMPALA-10650: Bailout min/max filters in hash join builder early This change set addresses the weakness in population min/max filters in the hash join builder by periodically measuring the usefulness of each filter and set the 'always_true_' flag accordingly. Once set to true, the insertion to such a filter completely skips the steps from the evaluation of the value from a row to the verification of the value in the min/max range. This optimization is LLVM-enabled. In addition, a new flag 'is_min_max_value_present' is added to TRuntimeFilterTargetDesc to indicate whether the min/max column stats is present in the query plan. The flag eliminates the need to check the presence of min/max stats for every row in back-end. Early bail out improves the HJ builder step in general. For example, the step for join node #11 in TPCDS Q8 improves 13%, and the step for join node #8 in TPCDS Q16 improves 3.2%. The Insert() methods are optimized with branch predication compiler hints which yield the following improvement when tested with the insertion of 1 randomly generated items. Small Integers: 7.0% Integers: 4.1% Big Integers: 4.3% Strings:5.6% Dates: 4.4% Timestamps:10.7% Decimals(4): 10.4% Decimals(8):9.1% The min/max stats for pages are read in batches with the fast track version for column types that have identical storage format as Parquet. Testing: 1. Ran core test; 2. Ran performance test (TBD). Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 --- M be/src/codegen/gen_ir_descriptions.py M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.h M be/src/exec/parquet/parquet-column-stats.cc M be/src/exec/parquet/parquet-column-stats.h M be/src/exec/parquet/parquet-column-stats.inline.h M be/src/exec/parquet/parquet-common.h M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/runtime/runtime-filter-ir.cc M be/src/util/min-max-filter-ir.cc M be/src/util/min-max-filter.cc M be/src/util/min-max-filter.h M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/util/TColumnValueUtil.java 18 files changed, 864 insertions(+), 256 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/17295/24 -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 24 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Qifan Chen has uploaded a new patch set (#23). ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. IMPALA-10650: Bailout min/max filters in hash join builder early This change set addresses the weakness in population min/max filters in the hash join builder by periodically measuring the usefulness of each filter and set the 'always_true_' flag accordingly. Once set to true, the insertion to such a filter completely skips the steps from the evaluation of the value from a row to the verification of the value in the min/max range. This optimization is LLVM-enabled. In addition, a new flag 'is_min_max_value_present' is added to TRuntimeFilterTargetDesc to indicate whether the min/max column stats is present in the query plan. The flag eliminates the need to check the presence of min/max stats for every row in back-end. Early bail out improves the HJ builder step in general. For example, the step for join node #11 in TPCDS Q8 improves 13%, and the step for join node #8 in TPCDS Q16 improves 3.2%. The Insert() methods are optimized with branch predication compiler hints which yield the following improvement when tested with the insertion of 1 randomly generated items. Small Integers: 7.0% Integers: 4.1% Big Integers: 4.3% Strings:5.6% Dates: 4.4% Timestamps:10.7% Decimals(4): 10.4% Decimals(8):9.1% The min/max stats for pages are read in batches with the fast track version for column types that have identical storage format as Parquet. Testing: 1. Ran core test; 2. Ran performance test (TBD). Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 --- M be/src/codegen/gen_ir_descriptions.py M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.h M be/src/exec/parquet/parquet-column-stats.cc M be/src/exec/parquet/parquet-column-stats.h M be/src/exec/parquet/parquet-column-stats.inline.h M be/src/exec/parquet/parquet-common.h M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/runtime/runtime-filter-ir.cc M be/src/util/min-max-filter-ir.cc M be/src/util/min-max-filter.cc M be/src/util/min-max-filter.h M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/util/TColumnValueUtil.java 18 files changed, 918 insertions(+), 257 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/17295/23 -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 23 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. Patch Set 20: (3 comments) http://gerrit.cloudera.org:8080/#/c/17295/8/be/src/exec/partitioned-hash-join-builder.cc File be/src/exec/partitioned-hash-join-builder.cc: http://gerrit.cloudera.org:8080/#/c/17295/8/be/src/exec/partitioned-hash-join-builder.cc@329 PS8, Line 329: DetermineUsefulnessForMinmaxFilters(); > The method FilterContext::ShouldRejectFilterBasedOnColumnStats() is calle Yeah, it's O(1) in complexity. But unpredictable branches are harmful for mordern CPUs that have pipelines. It seems we don't need over-optimization on this since it's not in the hot path. Let's skip this. http://gerrit.cloudera.org:8080/#/c/17295/8/be/src/exec/partitioned-hash-join-builder.cc@351 PS8, Line 351: VLOG(3) > Double checked with the ClangFormat which produces the above spacing. hmm.. I'm following this: https://google.github.io/styleguide/cppguide.html#Function_Calls. It's used by our c++ style: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65868536 Maybe clang-format didn't check the continuation indent width. But existing codes (e.g. line 391, 399 in this PS) use 4 spaces indent width. http://gerrit.cloudera.org:8080/#/c/17295/20/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java: http://gerrit.cloudera.org:8080/#/c/17295/20/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@267 PS20, Line 267: minMaxValuePresent I have a newbie question for the min-max filter: do we have a way (e.g. a query option) to disable using the min-max stats if users find they are stale? -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 20 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Sat, 08 May 2021 08:12:09 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. Patch Set 20: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8671/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 20 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 30 Apr 2021 18:16:42 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. Patch Set 19: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8670/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 19 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 30 Apr 2021 18:07:29 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Qifan Chen has uploaded a new patch set (#20). ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. IMPALA-10650: Bailout min/max filters in hash join builder early This change set addresses the weakness in population min/max filters in the hash join builder by periodically measuring the usefulness of each filter and set the 'always_true_' flag accordingly. Once set to true, the insertion to such a filter completely skips the steps from the evaluation of the value from a row to the verification of the value in the min/max range. This optimization is LLVM-enabled. In addition, a new flag 'is_min_max_value_present' is added to TRuntimeFilterTargetDesc to indicate whether the min/max column stats is present in the query plan. The flag eliminates the need to check the presence of min/max stats for every row in back-end. Early bail out improves the HJ builder step in general. For example, the step for join node #11 in TPCDS Q8 improves 13%, and the step for join node #8 in TPCDS Q16 improves 3.2%. The Insert() methods are optimized with branch predication compiler hints which yield the following improvement when tested with the insertion of 1 randomly generated items. Small Integers: 7.0% Integers: 4.1% Big Integers: 4.3% Strings:5.6% Dates: 4.4% Timestamps:10.7% Decimals(4): 10.4% Decimals(8):9.1% Testing: 1. Ran core test; 2. Ran performance test (TBD). Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 --- M be/src/codegen/gen_ir_descriptions.py M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/runtime/runtime-filter-ir.cc M be/src/util/min-max-filter-ir.cc M be/src/util/min-max-filter.cc M be/src/util/min-max-filter.h M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/util/TColumnValueUtil.java 13 files changed, 485 insertions(+), 238 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/17295/20 -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 20 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Qifan Chen has uploaded a new patch set (#19). ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. IMPALA-10650: Bailout min/max filters in hash join builder early This change set addresses the weakness in population min/max filters in the hash join builder by periodically measuring the usefulness of each filter and set the 'always_true_' flag accordingly. Once set to true, the insertion to such a filter completely skips the steps from the evaluation of the value from a row to the verification of the value in the min/max range. This optimization is LLVM-enabled. In addition, a new flag 'is_min_max_value_present' is added to TRuntimeFilterTargetDesc to indicate whether the min/max column stats is present in the query plan. The flag eliminates the need to check the presence of min/max stats for every row in back-end. The Insert() methods are optimized with branch predication compiler hints which yield the following improvement when tested with the insertion of 1 randomly generated items. Small Integers: 7.0% Integers: 4.1% Big Integers: 4.3% Strings:5.6% Dates: 4.4% Timestamps:10.7% Decimals(4): 10.4% Decimals(8):9.1% Testing: 1. Ran core test; 2. Ran performance test (TBD). Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 --- M be/src/codegen/gen_ir_descriptions.py M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/runtime/runtime-filter-ir.cc M be/src/util/min-max-filter-ir.cc M be/src/util/min-max-filter.cc M be/src/util/min-max-filter.h M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/util/TColumnValueUtil.java 13 files changed, 485 insertions(+), 238 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/17295/19 -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 19 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. Patch Set 18: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8665/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 18 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 29 Apr 2021 20:12:45 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Qifan Chen has uploaded a new patch set (#18). ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. IMPALA-10650: Bailout min/max filters in hash join builder early This change set addresses the weakness in population min/max filters in the hash join builder by periodically measuring the usefulness of each filter and set the 'always_true_' flag accordingly. Once set to true, the insertion to such a filter completely skips the steps from the evaluation of the value from a row to the verification of the value in the min/max range. This optimization is LLVM-enabled. In addition, a new flag 'is_min_max_value_present' is added to TRuntimeFilterTargetDesc to indicate whether the min/max column stats is present in the query plan. The flag eliminates the need to check the presence of min/max stats for every row in back-end. The Insert() methods are optimized with branch predication compiler hints which yield the following improvement when tested with the insertion of 1 randomly generated items. Small Integers: 7.0% Integers: 4.1% Big Integers: 4.3% Strings:5.6% Dates: 4.4% Timestamps:10.7% Decimals(4): 10.4% Decimals(8):9.1% Testing: 1. Ran core test; 2. Ran performance test (TBD). Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 --- M be/src/codegen/gen_ir_descriptions.py M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/runtime/runtime-filter-ir.cc M be/src/util/min-max-filter-ir.cc M be/src/util/min-max-filter.cc M be/src/util/min-max-filter.h M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/util/TColumnValueUtil.java 13 files changed, 410 insertions(+), 164 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/17295/18 -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 18 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. Patch Set 17: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8649/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 17 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Tue, 27 Apr 2021 17:29:39 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. Patch Set 16: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8648/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 16 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Tue, 27 Apr 2021 17:17:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. Patch Set 15: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8647/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 15 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Tue, 27 Apr 2021 17:14:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Qifan Chen has uploaded a new patch set (#17). ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. IMPALA-10650: Bailout min/max filters in hash join builder early This change set addresses the weakness in population min/max filters in the hash join builder by periodically measuring the usefulness of each filter and set the 'always_true_' flag accordingly. Once set to true, the insertion to such a filter completely skips the steps from the evaluation of the value from a row to the verification of the value in the min/max range. This optimization is LLVM-enabled. In addition, a new flag 'is_min_max_value_present' is added to TRuntimeFilterTargetDesc to indicate whether the min/max column stats is present in the query plan. The flag eliminates the need to check the presence of min/max stats for every row in back-end. The Insert() methods are optimized with branch predication compiler hints which yield the following improvement when tested with the insertion of 1 randomly generated items. Small Integers: 7.0% Integers: 4.1% Big Integers: 4.3% Strings:5.6% Dates: 4.4% Timestamps:10.7% Decimals(4): 10.4% Decimals(8):9.1% Testing: 1. Ran core test; 2. Ran performance test (TBD). Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 --- M be/src/codegen/gen_ir_descriptions.py M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/runtime/runtime-filter-ir.cc M be/src/util/min-max-filter-ir.cc M be/src/util/min-max-filter.cc M be/src/util/min-max-filter.h M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/util/TColumnValueUtil.java 12 files changed, 393 insertions(+), 160 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/17295/17 -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 17 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Qifan Chen has uploaded a new patch set (#16). ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. IMPALA-10650: Bailout min/max filters in hash join builder early This change set addresses the weakness in population min/max filters in the hash join builder by periodically measuring the usefulness of each such filter and set the 'always_true_' flag to true. Once set to true, the insertion to such a filter completely skips the steps from the evaluation of the value from a row to the verification of the value in the min/max range. This optimization is LLVM-codeded. In addition, a new flag 'is_min_max_value_present' is added to TRuntimeFilterTargetDesc to indicate whether the min/max column stats is present in the query plan. The flag eliminates the need to check the presence of min/max stats for every row in runtime. The Insert() methods are optimized with branch predication compiler hints which yield the following improvement when tested with the insertion of 1 randomly generated items. Small Integers: 7.0% Integers: 4.1% Big Integers: 4.3% Strings:5.6% Dates: 4.4% Timestamps:10.7% Decimals(4): 10.4% Decimals(8):9.1% Testing: 1. Ran core test; 2. Ran performance test (TBD). Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 --- M be/src/codegen/gen_ir_descriptions.py M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/runtime/runtime-filter-ir.cc M be/src/util/min-max-filter-ir.cc M be/src/util/min-max-filter.cc M be/src/util/min-max-filter.h M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/util/TColumnValueUtil.java 12 files changed, 393 insertions(+), 160 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/17295/16 -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 16 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Qifan Chen has uploaded a new patch set (#15). ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. IMPALA-10650: Bailout min/max filters in hash join builder early This change set addresses the weakness in population min/max filters in the hash join builder by periodically measuring the usefulness of each such filter and set the 'always_true_' flag to true. Once set to true, the insertion to such a filter completely skips the steps from the evaluation of the value from a row to the verification of the value in the min/max range. This optimization is LLVM-codeded. In addition, a new flag 'is_min_max_value_present' is added to TRuntimeFilterTargetDesc to indicate whether the min/max column stats is present in the query plan. The flag eliminates the need to check the presence of min/max stats for every row in runtime. The Insert() methods are optimized with branch predication compiler hints which yield the following improvement with the insertion of 1 random generatedly items. Small Integers: 7.0% Integers: 4.1% Big Integers: 4.3% Strings:5.6% Dates: 4.4% Timestamps:10.7% Decimals(4): 10.4% Decimals(8):9.1% Testing: 1. Ran core test; 2. Ran performance test (TBD). Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 --- M be/src/codegen/gen_ir_descriptions.py M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/runtime/runtime-filter-ir.cc M be/src/util/min-max-filter-ir.cc M be/src/util/min-max-filter.cc M be/src/util/min-max-filter.h M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/util/TColumnValueUtil.java 12 files changed, 393 insertions(+), 160 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/17295/15 -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 15 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. Patch Set 14: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8642/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 14 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Mon, 26 Apr 2021 18:26:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Qifan Chen has uploaded a new patch set (#14). ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. IMPALA-10650: Bailout min/max filters in hash join builder early This change set addresses the weakness in population min/max filters in the hash join builder by periodically measuring the usefulness of each such filter and set the 'always_true_' flag to true. Once set to true, the insertion to such a filter completely skips the steps from the evaluation of the value from a row to the verification of the value in the min/max range. This optimization is LLVM-codeded. In addition, a new flag 'is_min_max_value_present' is added to TRuntimeFilterTargetDesc to indicate whether the min/max column stats is present in the query plan. The flag eliminates the need to check the presence of min/max stats for every row in runtime. The Insert() methods are optimized with branch predication compiler hints which yield 4% to 7% improvement for common SQL Integer types. Testing: 1. Ran core test; 2. Ran performance test (TBD). Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 --- M be/src/codegen/gen_ir_descriptions.py M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/runtime/runtime-filter-ir.cc M be/src/util/min-max-filter-ir.cc M be/src/util/min-max-filter.cc M be/src/util/min-max-filter.h M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/util/TColumnValueUtil.java 12 files changed, 370 insertions(+), 150 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/17295/14 -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 14 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. Patch Set 13: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8633/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 13 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 23 Apr 2021 23:14:31 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Qifan Chen has uploaded a new patch set (#13). ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. IMPALA-10650: Bailout min/max filters in hash join builder early This change set addresses the weakness in population min/max filters in the hash join builder by periodically measuring the usefulness of each such filter and set the 'always_true_' flag to true. Once set to true, the insertion to such a filter will completely skip the steps from the evaluation of the value from a row to the verification of the value in the min/max range. This optimization is LLVM-codeded. In addition, a new flag 'is_min_max_value_present' is added to TRuntimeFilterTargetDesc to indicate whether the min/max column stats is present in the query plan. The flag eliminates the need to check the presence of min/max stats for every row in runtime. Testing: 1. Ran core test; 2. Ran performance test (TBD). Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 --- M be/src/codegen/gen_ir_descriptions.py M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/runtime/runtime-filter-ir.cc M be/src/util/min-max-filter-ir.cc M be/src/util/min-max-filter.cc M be/src/util/min-max-filter.h M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/util/TColumnValueUtil.java 12 files changed, 368 insertions(+), 150 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/17295/13 -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 13 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. Patch Set 12: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8625/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 12 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 23 Apr 2021 01:14:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Qifan Chen has uploaded a new patch set (#12). ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. IMPALA-10650: Bailout min/max filters in hash join builder early This change set addresses the weakness in population min/max filters in the hash join builder by periodically measuring the usefulness of each such filter and set the 'always_true_' flag to true. Once set to true, the insertion to such a filter will completely skip the steps from the evaluation of the value from a row to the verification of the value in the min/max range. This optimization is LLVM-codeded. In addition, a new flag 'is_min_max_value_present' is added to TRuntimeFilterTargetDesc to indicate whether the min/max column stats is present in the query plan. The flag eliminates the need to check the presence of min/max stats for every row in runtime. Testing: 1. Ran core test; 2. Ran performance test (TBD). Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 --- M be/src/codegen/gen_ir_descriptions.py M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/runtime/runtime-filter-ir.cc M be/src/util/min-max-filter-ir.cc M be/src/util/min-max-filter.cc M be/src/util/min-max-filter.h M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/util/TColumnValueUtil.java 12 files changed, 321 insertions(+), 103 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/17295/12 -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 12 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. Patch Set 11: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/8623/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 11 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 22 Apr 2021 20:14:34 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Qifan Chen has uploaded a new patch set (#11). ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. IMPALA-10650: Bailout min/max filters in hash join builder early This change set addresses the weakness in population min/max filters in the hash join builder by periodically measuring the usefulness of each such filter and set the 'always_true_' flag to true. Once set to true, the insertion to such a filter will completely skip the steps from the evaluation of the value from a row to the verification of the value in the min/max range. This optimization is LLVM-codeded. In addition, a new flag 'is_min_max_value_present' is added to TRuntimeFilterTargetDesc to indicate whether the min/max column stats is present in the query plan. The flag eliminates the need to check the presence of min/max stats for every row in runtime. Testing: 1. Ran core test; 2. Ran performance test (TBD). Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 --- M be/src/codegen/gen_ir_descriptions.py M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/runtime/runtime-filter-ir.cc M be/src/util/min-max-filter-ir.cc M be/src/util/min-max-filter.cc M be/src/util/min-max-filter.h M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/util/TColumnValueUtil.java 12 files changed, 321 insertions(+), 103 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/17295/11 -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 11 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. Patch Set 10: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/8621/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 10 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 22 Apr 2021 19:02:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Qifan Chen has uploaded a new patch set (#10). ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. IMPALA-10650: Bailout min/max filters in hash join builder early This change set addresses the weakness in population min/max filters in the hash join builder by periodically measuring the usefulness of each such filter and set the 'always_true_' flag to true. Once set to true, the insertion to such a filter will completely skip the steps from the evaluation of the value from a row to the verification of the value in the min/max range. This optimization is LLVM-codeded. In addition, a new flag 'is_min_max_value_present' is added to TRuntimeFilterTargetDesc to indicate whether the min/max column stats is present in the query plan. The flag eliminates the need to check the presence of min/max stats for every row in runtime. Testing: 1. Ran core test; 2. Ran performance test (TBD). Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 --- M be/src/codegen/gen_ir_descriptions.py M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/runtime/runtime-filter-ir.cc M be/src/util/min-max-filter-ir.cc M be/src/util/min-max-filter.cc M be/src/util/min-max-filter.h M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/util/TColumnValueUtil.java 12 files changed, 323 insertions(+), 104 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/17295/10 -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 10 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. Patch Set 9: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8619/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 9 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 22 Apr 2021 00:05:20 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Qifan Chen has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. IMPALA-10650: Bailout min/max filters in hash join builder early This change set addresses the weakness in population min/max filters in the hash join builder by periodically measuring the usefulness of each such filter and set the 'always_true_' flag to true. Inserting into a filter with a true 'always_true_' flag, the steps from the evaluation of the value from a row to the verification of the value in the min/max range are skipped completely. The above optimization is also LLVM-codeded. Testing: 1. Ran core test; 2. Ran performance test (TBD). Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 --- M be/src/codegen/gen_ir_descriptions.py M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/runtime/runtime-filter-ir.cc M be/src/util/min-max-filter-ir.cc M be/src/util/min-max-filter.cc M be/src/util/min-max-filter.h 9 files changed, 179 insertions(+), 42 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/17295/9 -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 9 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. Patch Set 8: (11 comments) I think this is a nice improvement. Looking forward to the performance results! BTW, Sorry for leaving lots of code style comments.. http://gerrit.cloudera.org:8080/#/c/17295/8/be/src/codegen/gen_ir_descriptions.py File be/src/codegen/gen_ir_descriptions.py: http://gerrit.cloudera.org:8080/#/c/17295/8/be/src/codegen/gen_ir_descriptions.py@235 PS8, Line 235: ["BOOL_MIN_MAX_FILTER_ALWAYSTRUE", "_ZNK6impala16BoolMinMaxFilter10AlwaysTrueEv"], : ["TINYINT_MIN_MAX_FILTER_ALWAYSTRUE", "_ZNK6impala19TinyIntMinMaxFilter10AlwaysTrueEv"], : ["SMALLINT_MIN_MAX_FILTER_ALWAYSTRUE", "_ZNK6impala20SmallIntMinMaxFilter10AlwaysTrueEv"], : ["INT_MIN_MAX_FILTER_ALWAYSTRUE", "_ZNK6impala15IntMinMaxFilter10AlwaysTrueEv"], : ["BIGINT_MIN_MAX_FILTER_ALWAYSTRUE", "_ZNK6impala18BigIntMinMaxFilter10AlwaysTrueEv"], : ["FLOAT_MIN_MAX_FILTER_ALWAYSTRUE", "_ZNK6impala17FloatMinMaxFilter10AlwaysTrueEv"], : ["DOUBLE_MIN_MAX_FILTER_ALWAYSTRUE", "_ZNK6impala18DoubleMinMaxFilter10AlwaysTrueEv"], : ["STRING_MIN_MAX_FILTER_ALWAYSTRUE", "_ZNK6impala18StringMinMaxFilter10AlwaysTrueEv"], : ["TIMESTAMP_MIN_MAX_FILTER_ALWAYSTRUE", "_ZNK6impala21TimestampMinMaxFilter10AlwaysTrueEv"], : ["DATE_MIN_MAX_FILTER_ALWAYSTRUE", "_ZNK6impala16DateMinMaxFilter10AlwaysTrueEv"], : ["DECIMAL_MIN_MAX_FILTER_ALWAYSTRUE", "_ZNK6impala19DecimalMinMaxFilter10AlwaysTrueEv"], Just curious, what errors will we encounter if we don't make the AlwaysTrue() method virtual and use _ZNK6impala12MinMaxFilter10AlwaysTrueEv directly? http://gerrit.cloudera.org:8080/#/c/17295/8/be/src/exec/filter-context.cc File be/src/exec/filter-context.cc: http://gerrit.cloudera.org:8080/#/c/17295/8/be/src/exec/filter-context.cc@281 PS8, Line 281: nit: redundant blank line http://gerrit.cloudera.org:8080/#/c/17295/8/be/src/exec/filter-context.cc@512 PS8, Line 512: if (computed_ratio) { : *computed_ratio = ratio; : } nit: our code style prefers collapsing simple if-statement to one line if (computed_ratio) *computed_ratio = ratio; http://gerrit.cloudera.org:8080/#/c/17295/8/be/src/exec/partitioned-hash-join-builder.cc File be/src/exec/partitioned-hash-join-builder.cc: http://gerrit.cloudera.org:8080/#/c/17295/8/be/src/exec/partitioned-hash-join-builder.cc@329 PS8, Line 329: DetermineUsefulnessForMinmaxFilters(); Should we codegen FilterContext::ShouldRejectFilterBasedOnColumnStats() if we want to call this for each batch (e.g. eliminate the switch branch in it in codegen)? http://gerrit.cloudera.org:8080/#/c/17295/8/be/src/exec/partitioned-hash-join-builder.cc@335 PS8, Line 335: if (filter_ctxs_.size() == 0) { : return; : } nit: our code style prefer collapsing simple if-statement to one line if (filter_ctxs_.size() == 0) return; http://gerrit.cloudera.org:8080/#/c/17295/8/be/src/exec/partitioned-hash-join-builder.cc@342 PS8, Line 342: if (ctx.local_min_max_filter->AlwaysTrue()) { : continue; : } nit: our code style prefer collapsing simple if-statement to one line if (ctx.local_min_max_filter->AlwaysTrue()) continue; http://gerrit.cloudera.org:8080/#/c/17295/8/be/src/exec/partitioned-hash-join-builder.cc@349 PS8, Line 349: auto nit: should we use "const auto&" instead? http://gerrit.cloudera.org:8080/#/c/17295/8/be/src/exec/partitioned-hash-join-builder.cc@351 PS8, Line 351: nit: our code style uses 4 spaces indention here http://gerrit.cloudera.org:8080/#/c/17295/8/be/src/exec/partitioned-hash-join-builder.cc@355 PS8, Line 355: if (min_ratio > ratio) { : min_ratio = ratio; : } nit: our code style prefer collapsing simple if-statement to one line if (min_ratio > ratio) min_ratio = ratio; http://gerrit.cloudera.org:8080/#/c/17295/8/be/src/exec/partitioned-hash-join-builder.cc@992 PS8, Line 992: auto nit: should we use "const auto&" instead? http://gerrit.cloudera.org:8080/#/c/17295/8/be/src/runtime/runtime-filter-ir.cc File be/src/runtime/runtime-filter-ir.cc: http://gerrit.cloudera.org:8080/#/c/17295/8/be/src/runtime/runtime-filter-ir.cc@35 PS8, Line 35: && !filter->AlwaysTrue() Should we move this into LIKELY()? -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 8 Gerrit-Owner: Qifan Chen
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. Patch Set 8: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8603/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 8 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Tue, 20 Apr 2021 13:20:42 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Qifan Chen has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. IMPALA-10650: Bailout min/max filters in hash join builder early This change set addresses the weakness in population min/max filters in the hash join builder by periodically measuring the usefulness of each such filter and set the 'always_true_' flag to true. Inserting into a filter with a true 'always_true_' flag, the steps from the evaluation of the value from a row to the verification of the value in the min/max range are skipped completely. The above optimization is also LLVM-codeded. Testing: 1. Ran core test; 2. Ran performance test (TBD). Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 --- M be/src/codegen/gen_ir_descriptions.py M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/runtime/runtime-filter-ir.cc M be/src/util/min-max-filter-ir.cc M be/src/util/min-max-filter.cc M be/src/util/min-max-filter.h 9 files changed, 178 insertions(+), 38 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/17295/8 -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 8 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Wenzhe Zhou