[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. Patch Set 26: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6224/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 26 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 05 Jun 2020 18:34:07 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Thomas Tauber-Marshall has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu Defined the BloomFilter class as the wrapper of kudu::BlockBloomFilter. impala::BloomFilter build runtime bloom filter in kudu::BlockBloomFilter APIs with FastHash as default hash algorithm. Removed the duplicated functions from impala::BloomFillter class. Pushed down bloom filter to Kudu through Kudu clinet API. Added a new query option ENABLED_RUNTIME_FILTER_TYPES to set enabled runtime filter types, which only affect Kudu scan node now. By default, bloom filter is not enabled, only min-max filter will be enabled for Kudu. With this option, user could enable bloom filter, min-max filter, or both bloom and min-max runtime filters. Added new test cases in PlannerTest and end-end runtime_filters test for pushing down bloom filter to Kudu. Added test cases to compare the number of rows returned from Kudu scan when appling different types of runtime filter on same queries. Updated bloom-filter-benchmark due to the bloom-filter implementation change. Bump Kudu version to d652cab17. Testing: - Passed all exhaustive tests. Performance benchmark: - Ran single_node_perf_run.py on TPC-H with scale as 30 for parquet and Kudu. Verified that new hash function and bloom-filter implementation don't cause regressions for HDFS bloom filters. For Kudu, there is one regression for query TPCH-Q9 and there are improvement for about 8 queris when appling both bloom and min-max filters. The bloom filter reduce the number of rows returned from Kudu scan, hence reduce the cost for aggregation and hash join. But bloom filter evaluation add extra cost for Kudu scan, which offset the gain on aggregation and join. Kudu scan need to be optimized for bloom filter in following tasks. - Ran bloom-filter microbenchmarks and verified that there is no regression for Insert/Find/Union functions with or without AVX2 due to bloom-filter implementation changes. There is small performance degradation for Init function, but this function is not in hot path. Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Reviewed-on: http://gerrit.cloudera.org:8080/15683 Reviewed-by: Thomas Tauber-Marshall Tested-by: Impala Public Jenkins --- M be/CMakeLists.txt M be/src/benchmarks/bloom-filter-benchmark.cc M be/src/codegen/gen_ir_descriptions.py M be/src/exec/filter-context.cc M be/src/exec/kudu-scanner.cc M be/src/runtime/raw-value-ir.cc M be/src/runtime/raw-value.h M be/src/runtime/raw-value.inline.h M be/src/runtime/runtime-filter-bank.cc M be/src/runtime/runtime-filter-ir.cc M be/src/runtime/runtime-filter.h M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/bloom-filter-ir.cc M be/src/util/bloom-filter-test.cc M be/src/util/bloom-filter.cc M be/src/util/bloom-filter.h M be/src/util/debug-util.cc M be/src/util/debug-util.h M bin/impala-config.sh M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test A testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test A testdata/workloads/functional-query/queries/QueryTest/diff_runtime_filter_types.test M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test M tests/query_test/test_runtime_filters.py M tests/query_test/test_spilling.py 36 files changed, 2,024 insertions(+), 661 deletions(-) Approvals: Thomas Tauber-Marshall: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 27 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Wenzhe Zhou has uploaded a new patch set (#26). ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu Defined the BloomFilter class as the wrapper of kudu::BlockBloomFilter. impala::BloomFilter build runtime bloom filter in kudu::BlockBloomFilter APIs with FastHash as default hash algorithm. Removed the duplicated functions from impala::BloomFillter class. Pushed down bloom filter to Kudu through Kudu clinet API. Added a new query option ENABLED_RUNTIME_FILTER_TYPES to set enabled runtime filter types, which only affect Kudu scan node now. By default, bloom filter is not enabled, only min-max filter will be enabled for Kudu. With this option, user could enable bloom filter, min-max filter, or both bloom and min-max runtime filters. Added new test cases in PlannerTest and end-end runtime_filters test for pushing down bloom filter to Kudu. Added test cases to compare the number of rows returned from Kudu scan when appling different types of runtime filter on same queries. Updated bloom-filter-benchmark due to the bloom-filter implementation change. Bump Kudu version to d652cab17. Testing: - Passed all exhaustive tests. Performance benchmark: - Ran single_node_perf_run.py on TPC-H with scale as 30 for parquet and Kudu. Verified that new hash function and bloom-filter implementation don't cause regressions for HDFS bloom filters. For Kudu, there is one regression for query TPCH-Q9 and there are improvement for about 8 queris when appling both bloom and min-max filters. The bloom filter reduce the number of rows returned from Kudu scan, hence reduce the cost for aggregation and hash join. But bloom filter evaluation add extra cost for Kudu scan, which offset the gain on aggregation and join. Kudu scan need to be optimized for bloom filter in following tasks. - Ran bloom-filter microbenchmarks and verified that there is no regression for Insert/Find/Union functions with or without AVX2 due to bloom-filter implementation changes. There is small performance degradation for Init function, but this function is not in hot path. Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 --- M be/CMakeLists.txt M be/src/benchmarks/bloom-filter-benchmark.cc M be/src/codegen/gen_ir_descriptions.py M be/src/exec/filter-context.cc M be/src/exec/kudu-scanner.cc M be/src/runtime/raw-value-ir.cc M be/src/runtime/raw-value.h M be/src/runtime/raw-value.inline.h M be/src/runtime/runtime-filter-bank.cc M be/src/runtime/runtime-filter-ir.cc M be/src/runtime/runtime-filter.h M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/bloom-filter-ir.cc M be/src/util/bloom-filter-test.cc M be/src/util/bloom-filter.cc M be/src/util/bloom-filter.h M be/src/util/debug-util.cc M be/src/util/debug-util.h M bin/impala-config.sh M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test A testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test A testdata/workloads/functional-query/queries/QueryTest/diff_runtime_filter_types.test M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test M tests/query_test/test_runtime_filters.py M tests/query_test/test_spilling.py 36 files changed, 2,024 insertions(+), 661 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/15683/26 -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 26 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. Patch Set 24: Could you do a pass over this with clang-format-diff? It's ready to be +2ed except for a few formatting issues. -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 24 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 04 Jun 2020 17:37:34 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. Patch Set 24: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6202/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 24 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Wed, 03 Jun 2020 14:24:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. Patch Set 23: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6201/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 23 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Wed, 03 Jun 2020 14:06:34 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Wenzhe Zhou has uploaded a new patch set (#24). ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu Defined the BloomFilter class as the wrapper of kudu::BlockBloomFilter. impala::BloomFilter build runtime bloom filter in kudu::BlockBloomFilter APIs with FastHash as default hash algorithm. Removed the duplicated functions from impala::BloomFillter class. Pushed down bloom filter to Kudu through Kudu clinet API. Added a new query option ENABLED_RUNTIME_FILTER_TYPES to set enabled runtime filter types, which only affect Kudu scan node now. By default, bloom filter is not enabled, only min-max filter will be enabled for Kudu. With this option, user could enable bloom filter, min-max filter, or both bloom and min-max runtime filters. Added new test cases in PlannerTest and end-end runtime_filters test for pushing down bloom filter to Kudu. Added test cases to compare the number of rows returned from Kudu scan when appling different types of runtime filter on same queries. Updated bloom-filter-benchmark due to the bloom-filter implementation change. Bump Kudu version to d652cab17. Testing: - Passed all exhaustive tests. Performance benchmark: - Ran single_node_perf_run.py on TPC-H with scale as 30 for parquet and Kudu. Verified that new hash function and bloom-filter implementation don't cause regressions for HDFS bloom filters. For Kudu, there is one regression for query TPCH-Q9 and there are improvement for about 8 queris when appling both bloom and min-max filters. The bloom filter reduce the number of rows returned from Kudu scan, hence reduce the cost for aggregation and hash join. But bloom filter evaluation add extra cost for Kudu scan, which offset the gain on aggregation and join. Kudu scan need to be optimized for bloom filter in following tasks. - Ran bloom-filter microbenchmarks and verified that there is no regression for Insert/Find/Union functions with or without AVX2 due to bloom-filter implementation changes. There is small performance degration for Init function, but this function is not in hot path. Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 --- M be/CMakeLists.txt M be/src/benchmarks/bloom-filter-benchmark.cc M be/src/codegen/gen_ir_descriptions.py M be/src/exec/filter-context.cc M be/src/exec/kudu-scanner.cc M be/src/runtime/raw-value-ir.cc M be/src/runtime/raw-value.h M be/src/runtime/raw-value.inline.h M be/src/runtime/runtime-filter-bank.cc M be/src/runtime/runtime-filter-ir.cc M be/src/runtime/runtime-filter.h M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/bloom-filter-ir.cc M be/src/util/bloom-filter-test.cc M be/src/util/bloom-filter.cc M be/src/util/bloom-filter.h M be/src/util/debug-util.cc M be/src/util/debug-util.h M bin/impala-config.sh M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test A testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test A testdata/workloads/functional-query/queries/QueryTest/diff_runtime_filter_types.test M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test M tests/query_test/test_runtime_filters.py M tests/query_test/test_spilling.py 36 files changed, 2,017 insertions(+), 651 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/15683/24 -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 24 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Wenzhe Zhou has uploaded a new patch set (#23). ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu Defined the BloomFilter class as the wrapper of kudu::BlockBloomFilter. impala::BloomFilter build runtime bloom filter in kudu::BlockBloomFilter APIs with FastHash as default hash algorithm. Removed the duplicated functions from impala::BloomFillter class. Pushed down bloom filter to Kudu through Kudu clinet API. Added a new query option ENABLED_RUNTIME_FILTER_TYPES to set enabled runtime filter types, which only affect Kudu scan node now. By default, bloom filter is not enabled, only min-max filter will be enabled for Kudu. With this option, user could enable bloom filter, min-max filter, or both bloom and min-max runtime filters. Added new test cases in PlannerTest and end-end runtime_filters test for pushing down bloom filter to Kudu. Added test cases to compare the number of rows returned from Kudu scan when appling different types of runtime filter on same queries. Updated bloom-filter-benchmark due to the bloom-filter implementation change. Bump Kudu version to d652cab17. Testing: - Passed all exhaustive tests. Performance benchmark: - Ran single_node_perf_run.py on TPC-H with scale as 30 for parquet and Kudu. Verified that new hash function and bloom-filter implementation don't cause regressions for HDFS bloom filters. For Kudu, there is one regression for query TPCH-Q9 and there are improvement for about 8 queris when appling both bloom and min-max filters. The bloom filter reduce the number of rows returned from Kudu scan, hence reduce the cost for aggregation and hash join. But bloom filter evaluation add extra cost for Kudu scan, which offset the gain on aggregation and join. Kudu scan need to be optimized for bloom filter in following tasks. - Ran bloom-filter microbenchmarks and verified that there is no regression for Insert/Find/Union functions with or without AVX2 due to bloom-filter implementation changes. There is small performance degration for Init function, but this function is not in hot path. Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 --- M be/CMakeLists.txt M be/src/benchmarks/bloom-filter-benchmark.cc M be/src/codegen/gen_ir_descriptions.py M be/src/exec/filter-context.cc M be/src/exec/kudu-scanner.cc M be/src/runtime/raw-value-ir.cc M be/src/runtime/raw-value.h M be/src/runtime/raw-value.inline.h M be/src/runtime/runtime-filter-bank.cc M be/src/runtime/runtime-filter-ir.cc M be/src/runtime/runtime-filter.h M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/bloom-filter-ir.cc M be/src/util/bloom-filter-test.cc M be/src/util/bloom-filter.cc M be/src/util/bloom-filter.h M be/src/util/debug-util.cc M be/src/util/debug-util.h M bin/impala-config.sh M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test A testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test A testdata/workloads/functional-query/queries/QueryTest/diff_runtime_filter_types.test M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test M tests/query_test/test_runtime_filters.py M tests/query_test/test_spilling.py 36 files changed, 2,017 insertions(+), 651 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/15683/23 -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 23 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. Patch Set 22: (23 comments) http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/codegen/gen_ir_descriptions.py File be/src/codegen/gen_ir_descriptions.py: http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/codegen/gen_ir_descriptions.py@204 PS22, Line 204: ["RAW_VALUE_GET_HASH_VALUE", > I don't think this is being used anywhere anymore, so it can be removed. Right, will remove this entry. http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/exec/kudu-scanner.cc File be/src/exec/kudu-scanner.cc: http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/exec/kudu-scanner.cc@247 PS22, Line 247: NULL > nit: we prefer nullptr, here and elsewhere Will fix it. http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/kudu/util/block_bloom_filter.h File be/src/kudu/util/block_bloom_filter.h: http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/kudu/util/block_bloom_filter.h@146 PS22, Line 146: // It's possible that log_heap_size is less than kLogBucketWordBits in unit test. > We definitely want to avoid making any changes to the be/src/kudu directory Will change bloom-filter-test.cc to avoid hitting this error. http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/runtime/raw-value.h File be/src/runtime/raw-value.h: http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/runtime/raw-value.h@86 PS22, Line 86: GetHashValueFastHash32 > I think you mean 'GetHashValueFastHash', i.e. this function doesn't return Right, fix it. http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/runtime/raw-value.h@93 PS22, Line 93: GetHashValueFastHash32NonNull > Same here - this doesn't return a 32-bit value, so I think you meant to say will fix it. http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter-test.cc File be/src/util/bloom-filter-test.cc: http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter-test.cc@34 PS22, Line 34: // kudu::BlockBloomFilter::kCpu is a static variable and is initialized once. : // To temporarily disable AVX2 for Bloom Filter in runtime testing, set flag : // disable_blockbloomfilter_avx2 as true. See kudu::BlockBloomFilter::has_avx2(). : // kudu::BlockBloomFilter::has_avx2() is called by kudu::BlockBloomFilter : // constructor and kudu::BlockBloomFilter::OrEqualArrayInternal(). Impala : // testing code should set this flag before creating impala::BloomFilter object : // which affect impala::BloomFilter::Insert() and impala::BloomFilter::Find(), : // or before calling impala::BloomFilter::or(). : // This flag has no effect if the target CPU doesn't support AVX2. > This comment is a lot longer than necessary. Probably fine to just say some will make it short. http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.h File be/src/util/bloom-filter.h: http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.h@40 PS22, Line 40: class BlockBloomFilter; : class BlockBloomFilterBufferAllocatorIf; : class Slice; > nit: I think currently as written none of these are needed, since Slice isn will remove these three lines. http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.h@70 PS22, Line 70: class ImpalaBloomFilterBufferAllocator : public kudu::BlockBloomFilterBufferAllocatorIf { > I think this is now only referenced in bloom-filter.cc, so we can just move A buffer allocator is defined as regular member variable in BloomFilter class. If moving this class definition to bloom-filter.cc, we will get a compilation error which complains incomplete type for the allocator member variable. http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.h@210 PS22, Line 210: * > nit: formatting will fix it http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.h@213 PS22, Line 213: /// Buffer allocator is owned by the bloom filter. > This comment seems weird. Maybe say something like "used by kudu to allocat will fix it. http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.h@215 PS22, Line 215: e > nit: 'E' Will fix it. http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.cc File be/src/util/bloom-filter.cc: http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.cc@35 PS22, Line 35: #include "util/kudu-status-util.h" > nit: please alphabetize Will fix it http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.cc@49 PS22, Line 49: delete > In general, we try to avoid using 'delete' as its error prone. will change it as regular member variable. http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.cc@52 PS22, Line 52: // Kudu only support FastHash for
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. Patch Set 22: (23 comments) The patch is looking pretty good, thanks for all the work on this! Mostly just some nits and additional testing left. http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/codegen/gen_ir_descriptions.py File be/src/codegen/gen_ir_descriptions.py: http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/codegen/gen_ir_descriptions.py@204 PS22, Line 204: ["RAW_VALUE_GET_HASH_VALUE", I don't think this is being used anywhere anymore, so it can be removed. http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/exec/kudu-scanner.cc File be/src/exec/kudu-scanner.cc: http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/exec/kudu-scanner.cc@247 PS22, Line 247: NULL nit: we prefer nullptr, here and elsewhere http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/kudu/util/block_bloom_filter.h File be/src/kudu/util/block_bloom_filter.h: http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/kudu/util/block_bloom_filter.h@146 PS22, Line 146: // It's possible that log_heap_size is less than kLogBucketWordBits in unit test. We definitely want to avoid making any changes to the be/src/kudu directory if possible, since it makes upgrading the newer versions trickier. Is it possible to rewrite the tests to not trigger this? http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/runtime/raw-value.h File be/src/runtime/raw-value.h: http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/runtime/raw-value.h@86 PS22, Line 86: GetHashValueFastHash32 I think you mean 'GetHashValueFastHash', i.e. this function doesn't return a 32-bit value. http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/runtime/raw-value.h@93 PS22, Line 93: GetHashValueFastHash32NonNull Same here - this doesn't return a 32-bit value, so I think you meant to say GetHashValueFastHashNonNull http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter-test.cc File be/src/util/bloom-filter-test.cc: http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter-test.cc@34 PS22, Line 34: // kudu::BlockBloomFilter::kCpu is a static variable and is initialized once. : // To temporarily disable AVX2 for Bloom Filter in runtime testing, set flag : // disable_blockbloomfilter_avx2 as true. See kudu::BlockBloomFilter::has_avx2(). : // kudu::BlockBloomFilter::has_avx2() is called by kudu::BlockBloomFilter : // constructor and kudu::BlockBloomFilter::OrEqualArrayInternal(). Impala : // testing code should set this flag before creating impala::BloomFilter object : // which affect impala::BloomFilter::Insert() and impala::BloomFilter::Find(), : // or before calling impala::BloomFilter::or(). : // This flag has no effect if the target CPU doesn't support AVX2. This comment is a lot longer than necessary. Probably fine to just say something like "This flag is used in Kudu to temporarily disable avx2 support for testing purposes" http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.h File be/src/util/bloom-filter.h: http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.h@40 PS22, Line 40: class BlockBloomFilter; : class BlockBloomFilterBufferAllocatorIf; : class Slice; nit: I think currently as written none of these are needed, since Slice isn't used here and you're importing block_bloom_filter anyways. http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.h@70 PS22, Line 70: class ImpalaBloomFilterBufferAllocator : public kudu::BlockBloomFilterBufferAllocatorIf { I think this is now only referenced in bloom-filter.cc, so we can just move the definition there. http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.h@210 PS22, Line 210: * nit: formatting http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.h@213 PS22, Line 213: /// Buffer allocator is owned by the bloom filter. This comment seems weird. Maybe say something like "used by kudu to allocate memory for 'block_bloom_filter_' http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.h@215 PS22, Line 215: e nit: 'E' and please leave an extra line between this and the definition above http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.cc File be/src/util/bloom-filter.cc: http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.cc@35 PS22, Line 35: #include "util/kudu-status-util.h" nit: please alphabetize http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.cc@49 PS22, Line 49: delete In general, we try to avoid using 'delete' as its error prone. I'm not sure if there's a reason why it needs to be a pointer and not just a be a
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. Patch Set 22: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6186/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 22 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Mon, 01 Jun 2020 22:21:46 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Wenzhe Zhou has uploaded a new patch set (#22). ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu Defined the BloomFilter class as the wrapper of kudu::BlockBloomFilter. impala::BloomFilter build runtime bloom filter in kudu::BlockBloomFilter APIs with FastHash as default hash algorithm. Removed the duplicated functions from impala::BloomFillter class. Pushed down bloom filter to Kudu through Kudu clinet API. Added a new query option ENABLED_RUNTIME_FILTER_TYPES to set enabled runtime filter types, which only affect Kudu scan node now. By default, bloom filter is not enabled, only min-max filter will be enabled for Kudu. With this option, user could enable bloom filter, min-max filter, or both bloom and min-max runtime filters. Added new test cases in PlannerTest and end-end runtime_filters test for pushing down bloom filter to Kudu. Added test cases to compare the number of rows returned from Kudu scan when appling different types of runtime filter on same queries. Updated bloom-filter-benchmark due to the bloom-filter implementation change. Bump Kudu version to d652cab17. Testing: - Passed all exhaustive tests. Performance benchmark: - Ran single_node_perf_run.py on TPC-H with scale as 30 for parquet and Kudu. Verified that new hash function and bloom-filter implementation don't cause regressions for HDFS bloom filters. For Kudu, there is one regression for query TPCH-Q9 and there are improvement for about 8 queris when appling both bloom and min-max filters. The bloom filter reduce the number of rows returned from Kudu scan, hence reduce the cost for aggregation and hash join. But bloom filter evaluation add extra cost for Kudu scan, which offset the gain on aggregation and join. Kudu scan need to be optimized for bloom filter in following tasks. - Ran bloom-filter microbenchmarks and verified that there is no regression for Insert/Find/Union functions with or without AVX2 due to bloom-filter implementation changes. There is small performance degration for Init function, but this function is not in hot path. Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 --- M be/CMakeLists.txt M be/src/benchmarks/bloom-filter-benchmark.cc M be/src/codegen/gen_ir_descriptions.py M be/src/exec/filter-context.cc M be/src/exec/kudu-scanner.cc M be/src/kudu/util/block_bloom_filter.h M be/src/runtime/raw-value-ir.cc M be/src/runtime/raw-value.h M be/src/runtime/raw-value.inline.h M be/src/runtime/runtime-filter-bank.cc M be/src/runtime/runtime-filter-ir.cc M be/src/runtime/runtime-filter.h M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/bloom-filter-ir.cc M be/src/util/bloom-filter-test.cc M be/src/util/bloom-filter.cc M be/src/util/bloom-filter.h M be/src/util/debug-util.cc M be/src/util/debug-util.h M be/src/util/hash-util.h M bin/impala-config.sh M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test A testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test A testdata/workloads/functional-query/queries/QueryTest/diff_runtime_filter_types.test M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test M tests/query_test/test_runtime_filters.py M tests/query_test/test_spilling.py 37 files changed, 1,524 insertions(+), 622 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/15683/22 -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 22 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. Patch Set 21: (2 comments) http://gerrit.cloudera.org:8080/#/c/15683/21/be/src/util/bloom-filter.h File be/src/util/bloom-filter.h: http://gerrit.cloudera.org:8080/#/c/15683/21/be/src/util/bloom-filter.h@73 PS21, Line 73: use > uses fixed http://gerrit.cloudera.org:8080/#/c/15683/21/be/src/util/bloom-filter.cc File be/src/util/bloom-filter.cc: http://gerrit.cloudera.org:8080/#/c/15683/21/be/src/util/bloom-filter.cc@202 PS21, Line 202: BufferPool *buffer_pool = ExecEnv::GetInstance()->buffer_pool(); : DCHECK(buffer_pool_client_ != nullptr); > With this implementation, a check that is_allocated is not true is needed t will call close() first. -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 21 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Mon, 01 Jun 2020 21:24:59 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. Patch Set 21: (2 comments) http://gerrit.cloudera.org:8080/#/c/15683/21/be/src/util/bloom-filter.h File be/src/util/bloom-filter.h: http://gerrit.cloudera.org:8080/#/c/15683/21/be/src/util/bloom-filter.h@73 PS21, Line 73: use uses http://gerrit.cloudera.org:8080/#/c/15683/21/be/src/util/bloom-filter.cc File be/src/util/bloom-filter.cc: http://gerrit.cloudera.org:8080/#/c/15683/21/be/src/util/bloom-filter.cc@202 PS21, Line 202: BufferPool *buffer_pool = ExecEnv::GetInstance()->buffer_pool(); : DCHECK(buffer_pool_client_ != nullptr); With this implementation, a check that is_allocated is not true is needed to make sure previous allocation is not leaked? -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 21 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Mon, 01 Jun 2020 19:19:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. Patch Set 21: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6166/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 21 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 29 May 2020 18:25:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Wenzhe Zhou has uploaded a new patch set (#21). ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu Defined the BloomFilter class as the wrapper of kudu::BlockBloomFilter. impala::BloomFilter build runtime bloom filter in kudu::BlockBloomFilter APIs with FastHash as default hash algorithm. Removed the duplicated functions from impala::BloomFillter class. Pushed down bloom filter to Kudu through Kudu clinet API. Added a new query option ENABLED_RUNTIME_FILTER_TYPES to set enabled runtime filter types, which only affect Kudu scan node now. By default, bloom filter is not enabled, only min-max filter will be enabled for Kudu. With this option, user could enable bloom filter, min-max filter, or both bloom and min-max runtime filters. Added new test cases in PlannerTest and end-end runtime_filters test for pushing down bloom filter to Kudu. Added test cases to compare the number of rows returned from Kudu scan when appling different types of runtime filter on same queries. Updated bloom-filter-benchmark due to the bloom-filter implementation change. Bump Kudu version to d652cab17. Testing: - Passed all core tests. Performance benchmark: - Ran single_node_perf_run.py on TPC-H with scale as 30 for parquet and Kudu. Verified that new hash function and bloom-filter implementation don't cause regressions for HDFS bloom filters. For Kudu, there is one regression for query TPCH-Q9 and there are improvement for about 8 queris when appling both bloom and min-max filters. The bloom filter reduce the number of rows returned from Kudu scan, hence reduce the cost for aggregation and hash join. But bloom filter evaluation add extra cost for Kudu scan, which offset the gain on aggregation and join. Kudu scan need to be optimized for bloom filter in following tasks. - Ran bloom-filter microbenchmarks and verified that there is no regression for Insert/Find/Union functions with or without AVX2 due to bloom-filter implementation changes. There is small performance degration for Init function, but this function is not in hot path. Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 --- M be/CMakeLists.txt M be/src/benchmarks/bloom-filter-benchmark.cc M be/src/codegen/gen_ir_descriptions.py M be/src/exec/filter-context.cc M be/src/exec/kudu-scanner.cc M be/src/kudu/util/block_bloom_filter.h M be/src/runtime/raw-value-ir.cc M be/src/runtime/raw-value.h M be/src/runtime/raw-value.inline.h M be/src/runtime/runtime-filter-bank.cc M be/src/runtime/runtime-filter-ir.cc M be/src/runtime/runtime-filter.h M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/bloom-filter-ir.cc M be/src/util/bloom-filter-test.cc M be/src/util/bloom-filter.cc M be/src/util/bloom-filter.h M be/src/util/debug-util.cc M be/src/util/debug-util.h M be/src/util/hash-util.h M bin/impala-config.sh M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test A testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test A testdata/workloads/functional-query/queries/QueryTest/diff_runtime_filter_types.test M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test M tests/query_test/test_runtime_filters.py M tests/query_test/test_spilling.py 37 files changed, 1,522 insertions(+), 622 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/15683/21 -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 21 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. Patch Set 20: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6133/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 20 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Sat, 23 May 2020 22:54:42 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Wenzhe Zhou has uploaded a new patch set (#20). ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu Defined the BloomFilter class as the wrapper of kudu::BlockBloomFilter. impala::BloomFilter build runtime bloom filter in kudu::BlockBloomFilter APIs with FastHash as default hash algorithm. Removed the duplicated functions from impala::BloomFillter class. Pushed down bloom filter to Kudu through Kudu clinet API. Added a new query option ENABLED_RUNTIME_FILTER_TYPES to set enabled runtime filter types, which only affect Kudu scan node now. By default, bloom filter is not enabled, only min-max filter will be enabled for Kudu. With this option, user could enable bloom filter, min-max filter, or both bloom and min-max runtime filters. Added new test cases in PlannerTest and end-end runtime_filters test for pushing down bloom filter to Kudu. Added test cases to compare the number of rows returned from Kudu scan when appling different types of runtime filter on same queries. Updated bloom-filter-benchmark due to the bloom-filter implementation change. Bump Kudu version to d652cab17. Testing: - Passed all core tests. Performance benchmark: - Ran single_node_perf_run.py on TPC-H with scale as 30 for parquet and Kudu. Verified that new hash function and bloom-filter implementation don't cause regressions for HDFS bloom filters. For Kudu, there is one regression for query TPCH-Q9 and there are improvement for about 8 queris when appling both bloom and min-max filters. The bloom filter reduce the number of rows returned from Kudu scan, hence reduce the cost for aggregation and hash join. But bloom filter evaluation add extra cost for Kudu scan, which offset the gain on aggregation and join. Kudu scan need to be optimized for bloom filter in following tasks. - Ran bloom-filter microbenchmarks and verified that there is no regression for Insert/Find/Union functions with or without AVX2 due to bloom-filter implementation changes. There is small performance degration for Init function, but this function is not in hot path. Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 --- M be/CMakeLists.txt M be/src/benchmarks/bloom-filter-benchmark.cc M be/src/codegen/gen_ir_descriptions.py M be/src/exec/filter-context.cc M be/src/exec/kudu-scanner.cc M be/src/kudu/util/block_bloom_filter.h M be/src/runtime/raw-value-ir.cc M be/src/runtime/raw-value.h M be/src/runtime/raw-value.inline.h M be/src/runtime/runtime-filter-bank.cc M be/src/runtime/runtime-filter-ir.cc M be/src/runtime/runtime-filter.h M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/bloom-filter-ir.cc M be/src/util/bloom-filter-test.cc M be/src/util/bloom-filter.cc M be/src/util/bloom-filter.h M be/src/util/debug-util.cc M be/src/util/debug-util.h M be/src/util/hash-util.h M bin/impala-config.sh M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test A testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test A testdata/workloads/functional-query/queries/QueryTest/diff_runtime_filter_types.test M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test M tests/query_test/test_runtime_filters.py M tests/query_test/test_spilling.py 37 files changed, 1,523 insertions(+), 623 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/15683/20 -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 20 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. Patch Set 18: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6121/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 18 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 21 May 2020 01:21:07 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Wenzhe Zhou has uploaded a new patch set (#18). ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu Defined the BloomFilter class as the wrapper of kudu::BlockBloomFilter. impala::BloomFilter build runtime bloom filter in kudu::BlockBloomFilter APIs with FastHash as default hash algorithm. Removed the duplicated functions from impala::BloomFillter class. Pushed down bloom filter to Kudu through Kudu clinet API. Added a new query option ENABLED_RUNTIME_FILTER_TYPES to set enabled runtime filter types, which only affect Kudu scan node now. By default, bloom filter is not enabled, only min-max filter will be enabled for Kudu. With this option, user could enable bloom filter, min-max filter, or both bloom and min-max runtime filters. Added new test cases in PlannerTest and end-end runtime_filters test for pushing down bloom filter to Kudu. Added test cases to compare the number of rows returned from Kudu scan when appling different types of runtime filter on same queries. Updated bloom-filter-benchmark due to the bloom-filter implementation change. Testing: - Passed all core tests. Performance benchmark: - Ran single_node_perf_run.py on TPC-H with scale as 30 for parquet and Kudu. Verified that new hash function and bloom-filter implementation don't cause regressions for HDFS bloom filters. For Kudu, there is one regression for query TPCH-Q9 and there are improvement for about 8 queris when appling both bloom and min-max filters. The bloom filter reduce the number of rows returned from Kudu scan, hence reduce the cost for aggregation and hash join. But bloom filter evaluation add extra cost for Kudu scan, which offset the gain on aggregation and join. Kudu scan need to be optimized for bloom filter in following tasks. - Ran bloom-filter microbenchmarks and verified that there is no regression for Insert/Find/Union functions with or without AVX2 due to bloom-filter implementation changes. There is small performance degration for Init function, but this function is not in hot path. Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 --- M be/CMakeLists.txt M be/src/benchmarks/bloom-filter-benchmark.cc M be/src/codegen/gen_ir_descriptions.py M be/src/exec/filter-context.cc M be/src/exec/kudu-scanner.cc M be/src/kudu/util/block_bloom_filter.h M be/src/runtime/raw-value-ir.cc M be/src/runtime/raw-value.h M be/src/runtime/raw-value.inline.h M be/src/runtime/runtime-filter-bank.cc M be/src/runtime/runtime-filter-bank.h M be/src/runtime/runtime-filter-ir.cc M be/src/runtime/runtime-filter.h M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/bloom-filter-ir.cc M be/src/util/bloom-filter-test.cc M be/src/util/bloom-filter.cc M be/src/util/bloom-filter.h M be/src/util/debug-util.cc M be/src/util/debug-util.h M be/src/util/hash-util.h M bin/impala-config.sh M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test A testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test A testdata/workloads/functional-query/queries/QueryTest/diff_runtime_filter_types.test M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test M tests/query_test/test_runtime_filters.py M tests/query_test/test_spilling.py 38 files changed, 1,533 insertions(+), 622 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/15683/18 -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 18 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. Patch Set 17: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/6116/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 17 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Wed, 20 May 2020 18:43:44 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Wenzhe Zhou has uploaded a new patch set (#17). ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu Defined the BloomFilter class as the wrapper of kudu::BlockBloomFilter. impala::BloomFilter build runtime bloom filter in kudu::BlockBloomFilter APIs with FastHash as default hash algorithm. Removed the duplicated functions from impala::BloomFillter class. Pushed down bloom filter to Kudu through Kudu clinet API. Added a new query option to set enabled runtime filter types, which only affect Kudu scan node now. By default, both bloom filter and min-max filter will be enabled for Kudu. Added new test cases in PlannerTest and end-end runtime_filters test for pushing down bloom filter to Kudu. Updated bloom-filter-benchmark for the bloom-filter implementation change. Testing: - Passed test_kudu.py - Passed end-end test_runtime_filters.py. - Passed frontend Planner tests. - Ran single_node_perf_run.py on TPC-H with scale as 30 for parquet and Kudu. Verified that new hash function and bloom-filter implementation don't cause regressions for HDFS bloom filters. - Ran bloom-filter-benchmark and verified there is no regression due to bloom-filter implementation changes. Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 --- M be/CMakeLists.txt M be/src/benchmarks/bloom-filter-benchmark.cc M be/src/codegen/gen_ir_descriptions.py M be/src/exec/filter-context.cc M be/src/exec/kudu-scanner.cc M be/src/kudu/util/block_bloom_filter.h M be/src/runtime/raw-value-ir.cc M be/src/runtime/raw-value.h M be/src/runtime/raw-value.inline.h M be/src/runtime/runtime-filter-bank.cc M be/src/runtime/runtime-filter-bank.h M be/src/runtime/runtime-filter-ir.cc M be/src/runtime/runtime-filter.h M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/bloom-filter-ir.cc M be/src/util/bloom-filter-test.cc M be/src/util/bloom-filter.cc M be/src/util/bloom-filter.h M be/src/util/debug-util.cc M be/src/util/debug-util.h M be/src/util/hash-util.h M bin/impala-config.sh M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test A testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test M tests/query_test/test_runtime_filters.py M tests/query_test/test_spilling.py 37 files changed, 1,380 insertions(+), 622 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/15683/17 -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 17 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Jim Apple has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. Patch Set 14: (1 comment) http://gerrit.cloudera.org:8080/#/c/15683/12/be/src/benchmarks/bloom-filter-benchmark.cc File be/src/benchmarks/bloom-filter-benchmark.cc: http://gerrit.cloudera.org:8080/#/c/15683/12/be/src/benchmarks/bloom-filter-benchmark.cc@64 PS12, Line 64: // Machine Info: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz > Reran the the bloom filter benchmark on my desktop before and after applyin Sounds good; thanks! -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 14 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 14 May 2020 14:02:40 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. Patch Set 14: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/6058/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 14 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 14 May 2020 07:10:43 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Wenzhe Zhou has uploaded a new patch set (#14). ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu Defined the BloomFilter class as the wrapper of kudu::BlockBloomFilter. impala::BloomFilter build runtime bloom filter in kudu::BlockBloomFilter APIs with FastHash as default hash algorithm. Removed the duplicated functions from impala::BloomFillter class. Pushed down bloom filter to Kudu through Kudu clinet API. Added a new query option to set enabled runtime filter types, which only affect Kudu scan node now. By default, both bloom filter and min-max filter will be enabled for Kudu. Added new test cases in PlannerTest and end-end runtime_filters test for pushing down bloom filter to Kudu. Updated bloom-filter-benchmark for the bloom-filter implementation change. Testing: - Passed test_kudu.py - Passed end-end test_runtime_filters.py. - Passed frontend Planner tests. - Ran single_node_perf_run.py on TPC-H with scale as 30 for parquet and Kudu. Verified that new hash function and bloom-filter implementation don't cause regressions for HDFS bloom filters. - Ran bloom-filter-benchmark and verified there is no regression due to bloom-filter implementation changes. Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 --- M be/CMakeLists.txt M be/src/benchmarks/bloom-filter-benchmark.cc M be/src/codegen/gen_ir_descriptions.py M be/src/exec/filter-context.cc M be/src/exec/kudu-scanner.cc M be/src/runtime/raw-value-ir.cc M be/src/runtime/raw-value.h M be/src/runtime/raw-value.inline.h M be/src/runtime/runtime-filter-bank.cc M be/src/runtime/runtime-filter-bank.h M be/src/runtime/runtime-filter-ir.cc M be/src/runtime/runtime-filter.h M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/bloom-filter-ir.cc M be/src/util/bloom-filter-test.cc M be/src/util/bloom-filter.cc M be/src/util/bloom-filter.h M be/src/util/debug-util.cc M be/src/util/debug-util.h M be/src/util/hash-util.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test A testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test M tests/query_test/test_runtime_filters.py 34 files changed, 1,420 insertions(+), 633 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/15683/14 -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 14 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. Patch Set 12: (1 comment) http://gerrit.cloudera.org:8080/#/c/15683/12/be/src/exec/kudu-scanner.cc File be/src/exec/kudu-scanner.cc: http://gerrit.cloudera.org:8080/#/c/15683/12/be/src/exec/kudu-scanner.cc@252 PS12, Line 252: ImpalaBloomFilterBufferAllocator* allocator = : new ImpalaBloomFilterBufferAllocator( : state_->filter_bank()->buffer_pool_client()); This doesn't look right. The allocator that is used to create BBF must be passed down to kudu. In current impl, the one stored in impala BF must be passed down. https://gerrit.cloudera.org/c/15683/12/be/src/util/bloom-filter.h#214 -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 12 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Mon, 11 May 2020 23:15:39 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Jim Apple has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. Patch Set 12: (1 comment) http://gerrit.cloudera.org:8080/#/c/15683/12/be/src/benchmarks/bloom-filter-benchmark.cc File be/src/benchmarks/bloom-filter-benchmark.cc: http://gerrit.cloudera.org:8080/#/c/15683/12/be/src/benchmarks/bloom-filter-benchmark.cc@64 PS12, Line 64: // Machine Info: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz I'd suggest re-running these on your machine before and after applying this patch, to see if there are any regressions. There are an indirection or two that hopefully will be inlined away, but I'm not sure. -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 12 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Mon, 11 May 2020 03:19:34 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. Patch Set 12: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/6016/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 12 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Mon, 11 May 2020 01:48:25 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Wenzhe Zhou has uploaded a new patch set (#12). ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu Defined the BloomFilter class as the wrapper of kudu::BlockBloomFilter. impala::BloomFilter build runtime bloom filter in kudu::BlockBloomFilter APIs with FastHash as default hash algorithm. Removed the duplicated functions from impala::BloomFillter class. Pushed down bloom filter to Kudu through Kudu clinet API. Added a new query option to set enabled runtime filter types, which only affect Kudu scan node now. By default, both bloom filter and min-max filter will be enabled for Kudu. Added new test cases in PlannerTest and end-end runtime_filters test for pushing down bloom filter to Kudu. Updated bloom-filter-benchmark for the bloom-filter implementation change. Testing: - Passed test_kudu.py - Passed end-end test_runtime_filters.py. - Passed frontend Planner tests. - Ran single_node_perf_run.py on TPC-H with scale as 30 for parquet and Kudu. Verified that new hash function and bloom-filter implementation don't cause regressions for HDFS bloom filters. - Ran bloom-filter-benchmark and verified there is no regression due to bloom-filter implementation changes. Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 --- M be/CMakeLists.txt M be/src/benchmarks/bloom-filter-benchmark.cc M be/src/codegen/gen_ir_descriptions.py M be/src/exec/filter-context.cc M be/src/exec/kudu-scanner.cc M be/src/runtime/raw-value-ir.cc M be/src/runtime/raw-value.h M be/src/runtime/raw-value.inline.h M be/src/runtime/runtime-filter-bank.cc M be/src/runtime/runtime-filter-bank.h M be/src/runtime/runtime-filter-ir.cc M be/src/runtime/runtime-filter.h M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/bloom-filter-ir.cc M be/src/util/bloom-filter-test.cc M be/src/util/bloom-filter.cc M be/src/util/bloom-filter.h M be/src/util/debug-util.cc M be/src/util/debug-util.h M be/src/util/hash-util.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test A testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test M tests/query_test/test_runtime_filters.py 34 files changed, 1,339 insertions(+), 619 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/15683/12 -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 12 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. Patch Set 11: (2 comments) http://gerrit.cloudera.org:8080/#/c/15683/11/be/src/util/bloom-filter.cc File be/src/util/bloom-filter.cc: http://gerrit.cloudera.org:8080/#/c/15683/11/be/src/util/bloom-filter.cc@71 PS11, Line 71: if (directory_in_size == 0) > Could you clarify with a comment in what case would directory_in_size be 0 wdirectory_size equal 0 only when the fileter is always false. In the case the bloom filter will not be pushed to Kudu. Will add a comment here. http://gerrit.cloudera.org:8080/#/c/15683/11/be/src/util/bloom-filter.cc@201 PS11, Line 201: if (is_allocated_) { : LOG(DFATAL) << "Each call to AllocateBuffer must have a corresponding call " : << "to FreeBuffer."; : Close(); // Ensure that any previously allocated memory is released. : } > Same instance of allocator can be used for multiple BBFs. So this doesn't l Line 201 to 205 are not useful. Will remove them. -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 11 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Sat, 09 May 2020 17:03:48 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. Patch Set 11: (2 comments) Apologize for not taking a look earlier. Lot of the changes looked specific to Impala. I mainly looked at Impala BBF code that was updated to invoke one from kudu-util. http://gerrit.cloudera.org:8080/#/c/15683/11/be/src/util/bloom-filter.cc File be/src/util/bloom-filter.cc: http://gerrit.cloudera.org:8080/#/c/15683/11/be/src/util/bloom-filter.cc@71 PS11, Line 71: if (directory_in_size == 0) Could you clarify with a comment in what case would directory_in_size be 0 since this Init variant should really be used for deserialization case. http://gerrit.cloudera.org:8080/#/c/15683/11/be/src/util/bloom-filter.cc@201 PS11, Line 201: if (is_allocated_) { : LOG(DFATAL) << "Each call to AllocateBuffer must have a corresponding call " : << "to FreeBuffer."; : Close(); // Ensure that any previously allocated memory is released. : } Same instance of allocator can be used for multiple BBFs. So this doesn't look right to me. With that in mind, I don't see need for is_allocated_member variable. ImpalaBloomFilterBufferAllocator should be a simple wrapper on top of Impala's default buffer pool and should be implemented as a singleton. -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 11 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 08 May 2020 00:11:08 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. Patch Set 11: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/5995/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 11 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 07 May 2020 18:47:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Wenzhe Zhou has uploaded a new patch set (#11). ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu Defined the BloomFilter class as the wrapper of kudu::BlockBloomFilter. impala::BloomFilter build runtime bloom filter in kudu::BlockBloomFilter APIs with FastHash as default hash algorithm. Removed the duplicated functions from impala::BloomFillter class. Pushed down bloom filter to Kudu through Kudu clinet API. Added a new query option to set enabled runtime filter types, which only affect Kudu scan node now. By default, both bloom filter and min-max filter will be enabled for Kudu. Added new test cases in PlannerTest and end-end runtime_filters test for pushing down bloom filter to Kudu. Updated bloom-filter-benchmark for the bloom-filter implementation change. Testing: - Passed test_kudu.py - Passed end-end test_runtime_filters.py. - Passed frontend Planner tests. - Ran single_node_perf_run.py on TPC-H with scale as 30 for parquet and Kudu. Verified that new hash function and bloom-filter implementation don't cause regressions for HDFS bloom filters. - Ran bloom-filter-benchmark and verified there is no regression due to bloom-filter implementation changes. Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 --- M be/CMakeLists.txt M be/src/benchmarks/bloom-filter-benchmark.cc M be/src/codegen/gen_ir_descriptions.py M be/src/exec/filter-context.cc M be/src/exec/kudu-scanner.cc M be/src/runtime/raw-value-ir.cc M be/src/runtime/raw-value.h M be/src/runtime/raw-value.inline.h M be/src/runtime/runtime-filter-bank.cc M be/src/runtime/runtime-filter-bank.h M be/src/runtime/runtime-filter-ir.cc M be/src/runtime/runtime-filter.h M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/bloom-filter-ir.cc M be/src/util/bloom-filter-test.cc M be/src/util/bloom-filter.cc M be/src/util/bloom-filter.h M be/src/util/debug-util.cc M be/src/util/debug-util.h M be/src/util/hash-util.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test A testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test M tests/query_test/test_runtime_filters.py 34 files changed, 1,344 insertions(+), 620 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/15683/11 -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 11 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. Patch Set 9: (17 comments) http://gerrit.cloudera.org:8080/#/c/15683/9/be/CMakeLists.txt File be/CMakeLists.txt: http://gerrit.cloudera.org:8080/#/c/15683/9/be/CMakeLists.txt@234 PS9, Line 234: "-DKUDU_HEADERS_NO_STUBS" > nit formatting: trailing whitespace and you can wrap this into the rest of will fix as suggested. http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/filter-context.cc File be/src/exec/filter-context.cc: http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/filter-context.cc@91 PS9, Line 91: //local_bloom_filter->Insert(val, expr_eval->root().type()); > I guess this was left by accident? Right, will remove these two lines. http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/filter-context.cc@106 PS9, Line 106: // An example of the generated code for TPCH-Q2: RF002 -> n_regionkey > This needs to be updated (though of course it will only change slightly). I will add LlvmCodeGen::Print(*fn) at the end of function to dump the codegen functions, then update the sample in the functions' comments. http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/kudu-scanner.cc File be/src/exec/kudu-scanner.cc: http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/kudu-scanner.cc@232 PS9, Line 232: } : else if > nit: formatting ('else if' should go on the same line as prior '}') Will fix it. http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/kudu-scanner.cc@234 PS9, Line 234: continue; > Brief comment, eg. 'This filter won't actually remove any rows so we don't will add comments http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/kudu-scanner.cc@236 PS9, Line 236: else > nit: it doesn't actually change how the code works, but it would be better will remove "else" http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/kudu-scanner.cc@238 PS9, Line 238: if (filter != nullptr) { > I think we can save ourselves some duplication and an indention level by ch Agree, will change the code. http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/kudu-scanner.cc@239 PS9, Line 239: auto it = ctx.filter->filter_desc().planid_to_target_ndx.find( : scan_node_->id()); : const TRuntimeFilterTargetDesc _desc = : ctx.filter->filter_desc().targets[it->second]; : const string _name = target_desc.kudu_col_name; : DCHECK(col_name != ""); > This is duplicated below, I think we can eliminate the duplication by movin That's right. Will remove the duplication. http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.h File be/src/util/bloom-filter.h: http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.h@53 PS9, Line 53: class BloomFilterBufferAllocator; > I don't think this is necessary will remove it. http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.h@73 PS9, Line 73: BloomFilterBufferAllocator > nit: maybe name this ImpalaBloomFilterBufferAllocator to ensure there's no Will rename it as suggested http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.h@91 PS9, Line 91: std::shared_ptr Clone() const override { > If this is something that's really only used in Kudu's internal testing and Will add LOG(FATAL) and simplify the logic of allocate/close functions. http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.h@111 PS9, Line 111: /// A BloomFilter stores sets of items and offers a query operation indicating whether or > This class comment could probably use some updating to reflect the fact tha Will update the comments. http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.h@171 PS9, Line 171: void Insert(void* val, const ColumnType& col_type) noexcept; > Is this version of Insert() used anywhere? Same with the equivalent Find() Will remove them http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.h@241 PS9, Line 241: HashAlgorithm hash_algorithm_; > As far as I can tell, this isn't actually getting used anywhere, and I don' Will remove it. http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.cc File be/src/util/bloom-filter.cc: http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.cc@205 PS9, Line 205: if (is_allocated_) Close(); > I suspect its probably supposed to be guaranteed that FreeBuffer() is alway Will add LOG(DFATAL). http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.cc@226 PS9, Line 226: Close(); // Ensure that any previously allocated memory is released. > I wonder if this is really necessary - I would hope that Kudu provides a gu Will LOG(DFALTAL) if is allocated.
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. Patch Set 9: (17 comments) http://gerrit.cloudera.org:8080/#/c/15683/9/be/CMakeLists.txt File be/CMakeLists.txt: http://gerrit.cloudera.org:8080/#/c/15683/9/be/CMakeLists.txt@234 PS9, Line 234: "-DKUDU_HEADERS_NO_STUBS" nit formatting: trailing whitespace and you can wrap this into the rest of the string instead of putting it on its own line http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/filter-context.cc File be/src/exec/filter-context.cc: http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/filter-context.cc@91 PS9, Line 91: //local_bloom_filter->Insert(val, expr_eval->root().type()); I guess this was left by accident? http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/filter-context.cc@106 PS9, Line 106: // An example of the generated code for TPCH-Q2: RF002 -> n_regionkey This needs to be updated (though of course it will only change slightly). If you're not sure how to get this, see: https://cwiki.apache.org/confluence/display/IMPALA/Codegen (the easiest way is probably just to add a line at the end of this function that logs fn->dump() or whatever) http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/kudu-scanner.cc File be/src/exec/kudu-scanner.cc: http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/kudu-scanner.cc@232 PS9, Line 232: } : else if nit: formatting ('else if' should go on the same line as prior '}') Not sure if I've bugged you about clang-format before, but its a great way to help ensure that you're code reviews don't have these sorts of simple errors. See: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65868536 Note that clang-format is just a guide and if it's suggestion differs from surrounding code then you can feel free to ignore it http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/kudu-scanner.cc@234 PS9, Line 234: continue; Brief comment, eg. 'This filter won't actually remove any rows so we don't need to push it down to Kudu' http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/kudu-scanner.cc@236 PS9, Line 236: else nit: it doesn't actually change how the code works, but it would be better to leave off this 'else' since its not actually mutually exclusive with the prior 'if's http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/kudu-scanner.cc@238 PS9, Line 238: if (filter != nullptr) { I think we can save ourselves some duplication and an indention level by changing this to 'ctx.filter->HasFilter()' and moving it up to the 'if(AlwaysTrue())' http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/kudu-scanner.cc@239 PS9, Line 239: auto it = ctx.filter->filter_desc().planid_to_target_ndx.find( : scan_node_->id()); : const TRuntimeFilterTargetDesc _desc = : ctx.filter->filter_desc().targets[it->second]; : const string _name = target_desc.kudu_col_name; : DCHECK(col_name != ""); This is duplicated below, I think we can eliminate the duplication by moving this up to before the 'if(is_bloom)' http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.h File be/src/util/bloom-filter.h: http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.h@53 PS9, Line 53: class BloomFilterBufferAllocator; I don't think this is necessary http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.h@73 PS9, Line 73: BloomFilterBufferAllocator nit: maybe name this ImpalaBloomFilterBufferAllocator to ensure there's no confusion http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.h@91 PS9, Line 91: std::shared_ptr Clone() const override { If this is something that's really only used in Kudu's internal testing and won't ever get hit in Impala, it might be better to just do a LOG(FATAL) here or something, since then you would be able to simplify the logic of AllocateBuffer()/Close()/etc. and not support a code path that's never getting hit. Probably have to check with Bankim to ensure that's really the case, though http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.h@111 PS9, Line 111: /// A BloomFilter stores sets of items and offers a query operation indicating whether or This class comment could probably use some updating to reflect the fact that this class is now basically a thin wrapper around Kudu's bloom filter http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.h@171 PS9, Line 171: void Insert(void* val, const ColumnType& col_type) noexcept; Is this version of Insert() used anywhere? Same with the equivalent Find() below http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.h@241 PS9,
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. Patch Set 9: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/5939/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 9 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Sun, 03 May 2020 07:08:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Wenzhe Zhou has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu Defined the BloomFilter class as the wrapper of kudu::BlockBloomFilter. impala::BloomFilter build runtime bloom filter in kudu::BlockBloomFilter APIs with FastHash as default hash algorithm. Removed the duplicated functions from impala::BloomFillter class. Pushed down bloom filter to Kudu through Kudu clinet API. Added a new query option to set enabled runtime filter types, which only affect Kudu scan node now. By default, both bloom filter and min-max filter will be enabled for Kudu. Added new test cases in PlannerTest and end-end runtime_filters test for pushing down bloom filter to Kudu. Updated bloom-filter-benchmark for the bloom-filter implementation change. Testing: - Passed test_kudu.py - Passed end-end test_runtime_filters.py. - Passed frontend Planner tests. - Ran single_node_perf_run.py on TPC-H with scale as 30 for parquet and Kudu. Verified that new hash function and bloom-filter implementation don't cause regressions for HDFS bloom filters. - Ran bloom-filter-benchmark and verified there is no regression due to bloom-filter implementation changes. Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 --- M be/CMakeLists.txt M be/src/benchmarks/bloom-filter-benchmark.cc M be/src/codegen/gen_ir_descriptions.py M be/src/exec/filter-context.cc M be/src/exec/kudu-scanner.cc M be/src/runtime/raw-value-ir.cc M be/src/runtime/raw-value.h M be/src/runtime/raw-value.inline.h M be/src/runtime/runtime-filter-bank.cc M be/src/runtime/runtime-filter-bank.h M be/src/runtime/runtime-filter-ir.cc M be/src/runtime/runtime-filter.h M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/bloom-filter-ir.cc M be/src/util/bloom-filter-test.cc M be/src/util/bloom-filter.cc M be/src/util/bloom-filter.h A be/src/util/bloom-filter.inline.h M be/src/util/debug-util.cc M be/src/util/debug-util.h M be/src/util/hash-util.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test A testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test M tests/query_test/test_runtime_filters.py 35 files changed, 1,472 insertions(+), 557 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/15683/9 -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 9 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. Patch Set 8: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/5920/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 8 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 01 May 2020 00:11:25 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Wenzhe Zhou has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu Defined the BloomFilter class as the wrapper of kudu::BlockBloomFilter. impala::BloomFilter build runtime bloom filter in kudu::BlockBloomFilter APIs with FastHash as default hash algorithm. Removed the duplicated functions from impala::BloomFillter class. Pushed down bloom filter to Kudu through Kudu clinet API. Added a new query option to set enabled runtime filter types, which only affect Kudu scan node now. By default, both bloom filter and min-max filter will be enabled for Kudu. Added new test cases in PlannerTest and end-end runtime_filters test for pushing down bloom filter to Kudu. Updated bloom-filter-benchmark for the bloom-filter implementation change. Upgraded Kudu version to 389d4f1e1 to pull in Kudu Bloom Filter support which is needed for the Kudu/Impala Bloom Filter integration. Testing: - Passed test_kudu.py - Passed end-end test_runtime_filters.py. - Passed frontend Planner tests. - Ran single_node_perf_run.py on TPC-H with scale as 30 for parquet and Kudu. Verified that new hash function and bloom-filter implementation don't cause regressions for HDFS bloom filters. - Ran bloom-filter-benchmark and verified there is no regression due to bloom-filter implementation changes. Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 --- M be/CMakeLists.txt M be/src/benchmarks/bloom-filter-benchmark.cc M be/src/codegen/gen_ir_descriptions.py M be/src/exec/filter-context.cc M be/src/exec/kudu-scanner.cc M be/src/runtime/raw-value-ir.cc M be/src/runtime/raw-value.h M be/src/runtime/raw-value.inline.h M be/src/runtime/runtime-filter-bank.cc M be/src/runtime/runtime-filter-bank.h M be/src/runtime/runtime-filter-ir.cc M be/src/runtime/runtime-filter.h M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/bloom-filter-ir.cc M be/src/util/bloom-filter-test.cc M be/src/util/bloom-filter.cc M be/src/util/bloom-filter.h A be/src/util/bloom-filter.inline.h M be/src/util/debug-util.cc M be/src/util/debug-util.h M be/src/util/hash-util.h M bin/impala-config.sh M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test A testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test M tests/query_test/test_runtime_filters.py 36 files changed, 1,469 insertions(+), 560 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/15683/8 -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 8 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. Patch Set 7: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/5877/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 7 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Sat, 25 Apr 2020 05:07:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Wenzhe Zhou has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu Defined the BloomFilter class as the wrapper of Kudu BlockBloomFilter, which build runtime bloom filter in Kudu BlockBloomFilter APIs with FastHash as default hash algorithm. Removed the duplicated functions from BloomFillter class. Pushed down bloom filter to Kudu through Kudu clinet API. Added a new query option to set enabled runtime filter types. By default, both bloom filter and min-max filter will be enabled for Kudu. Added new test cases in PlannerTest and end-end runtime_filters test for pushing down bloom filter to Kudu. Testing: Passed end-end runtime filter tests. Passed frontend Planner tests. Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 --- M be/CMakeLists.txt M be/src/codegen/gen_ir_descriptions.py M be/src/exec/filter-context.cc M be/src/exec/kudu-scanner.cc M be/src/runtime/raw-value-ir.cc M be/src/runtime/raw-value.h M be/src/runtime/raw-value.inline.h M be/src/runtime/runtime-filter-bank.cc M be/src/runtime/runtime-filter-bank.h M be/src/runtime/runtime-filter-ir.cc M be/src/runtime/runtime-filter.h M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/bloom-filter-ir.cc M be/src/util/bloom-filter.cc M be/src/util/bloom-filter.h A be/src/util/bloom-filter.inline.h M be/src/util/debug-util.cc M be/src/util/debug-util.h M be/src/util/hash-util.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test A testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test M tests/query_test/test_runtime_filters.py 33 files changed, 1,445 insertions(+), 522 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/15683/7 -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 7 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. Patch Set 6: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/5860/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 6 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 23 Apr 2020 08:00:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Wenzhe Zhou has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu Defined the BloomFilter class as the wrapper of Kudu BlockBloomFilter, which build runtime bloom filter in Kudu BlockBloomFilter APIs with FastHash as default hash algorithm. Removed the duplicated functions from BloomFillter class. Added a query option to set enabled runtime filter types. Pushed down runtime filters to Kudu through Kudu client APIs. Added new test cases in PlannerTest and end-end runtime_filters test for pushing down bloom filter to Kudu. Testing: Passed end-end runtime filter tests. Passed frontend Planner tests. Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 --- M be/CMakeLists.txt M be/src/codegen/gen_ir_descriptions.py M be/src/exec/filter-context.cc M be/src/exec/kudu-scanner.cc M be/src/runtime/raw-value-ir.cc M be/src/runtime/raw-value.h M be/src/runtime/raw-value.inline.h M be/src/runtime/runtime-filter-bank.cc M be/src/runtime/runtime-filter-bank.h M be/src/runtime/runtime-filter-ir.cc M be/src/runtime/runtime-filter.h M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/bloom-filter-ir.cc M be/src/util/bloom-filter.cc M be/src/util/bloom-filter.h A be/src/util/bloom-filter.inline.h M be/src/util/debug-util.cc M be/src/util/debug-util.h M be/src/util/hash-util.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test A testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test M tests/query_test/test_runtime_filters.py 29 files changed, 1,262 insertions(+), 365 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/15683/6 -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 6 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/15683/4/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: http://gerrit.cloudera.org:8080/#/c/15683/4/common/thrift/PlanNodes.thrift@124 PS4, Line 124: MIN_MAX = 2 > It's possible that the comma separated list thing is a bad idea and won't w Query option setting is treated i as a string in impala-shell. We can set the option value with comma in impala-shell. But we will get parsing error in function impala::ParseQueryOptions() since comma is used as separator of a list of query option KV string, like "runtime_filter_min_size=1024, runtime_filter_max_size=65536". If we set "enabled_runtime_filter_type=bloom, min_max", the function parse the string as two KV string "enabled_runtime_filter_type=bloom," and "min_max". It get parsing error when parsing second KV string "min_max" since there is no "=" in the KV string. As discussed offline, we will not support comma separated filter types. Instead, rename BLOOM_MIN_MAX to ALL. -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 5 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 23 Apr 2020 06:37:39 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/15683/4/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: http://gerrit.cloudera.org:8080/#/c/15683/4/common/thrift/PlanNodes.thrift@124 PS4, Line 124: MIN_MAX = 2 > The code for showing the current settings in impala-shell are auto generate It's possible that the comma separated list thing is a bad idea and won't work, but I'm not sure why the shell would be affected by it. I think it should be possible to just treat is as a string from the perspective of impala-shell, its only necessary to break it up into a list internally in Impala. -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 5 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Mon, 20 Apr 2020 18:56:11 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. Patch Set 5: (5 comments) http://gerrit.cloudera.org:8080/#/c/15683/4/be/src/util/bloom-filter.h File be/src/util/bloom-filter.h: http://gerrit.cloudera.org:8080/#/c/15683/4/be/src/util/bloom-filter.h@80 PS4, Line 80: ublic: > unnecessary It's defined to support the "clone" virtual function in base class. http://gerrit.cloudera.org:8080/#/c/15683/4/be/src/util/bloom-filter.h@91 PS4, Line 91: std::shared_ptr Clone() const override { > I don't understand what this is supposed to do - why wouldn't the cloned al Yes, we cannot get the buffer pool from runtime filter bank without calling context. http://gerrit.cloudera.org:8080/#/c/15683/4/common/thrift/ImpalaService.thrift File common/thrift/ImpalaService.thrift: http://gerrit.cloudera.org:8080/#/c/15683/4/common/thrift/ImpalaService.thrift@521 PS4, Line 521: > Seems a little confusing, since users don't specify the numbers they use "B Fixed http://gerrit.cloudera.org:8080/#/c/15683/4/common/thrift/ImpalaService.thrift@524 PS4, Line 524: R_TYP > We'll eventually want to apply min-max filters in other places like HDFS sc Fixed http://gerrit.cloudera.org:8080/#/c/15683/4/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: http://gerrit.cloudera.org:8080/#/c/15683/4/common/thrift/PlanNodes.thrift@124 PS4, Line 124: MIN_MAX = 2 > We probably eventually want to add in-list filters as well, and we won't wa The code for showing the current settings in impala-shell are auto generated. Still try to figure out how to show the setting for enabled filter type as comma separated list of filter type. -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 5 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 17 Apr 2020 18:57:49 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. Patch Set 5: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/5817/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 5 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 17 Apr 2020 07:14:12 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/15683/5/tests/query_test/test_runtime_filters.py File tests/query_test/test_runtime_filters.py: http://gerrit.cloudera.org:8080/#/c/15683/5/tests/query_test/test_runtime_filters.py@31 PS5, Line 31: from tests.common.test_dimensions import ( flake8: F401 'tests.common.test_dimensions.create_exec_option_dimension' imported but unused -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 5 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 17 Apr 2020 07:04:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Wenzhe Zhou has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu Defined the BloomFilter class as the wrapper of Kudu BlockBloomFilter, which build runtime bloom filter in Kudu BlockBloomFilter APIs with FastHash as default hash algorithm. Removed the duplicated functions from BloomFillter class. Added a query option to set enabled runtime filter types. Pushed down runtime filters to Kudu through Kudu client APIs. Added new test cases in PlannerTest and end-end runtime_filters test for pushing down bloom filter to Kudu. Testing: Passed end-end runtime filter tests. Passed frontend Planner tests. Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 --- M be/CMakeLists.txt M be/src/codegen/gen_ir_descriptions.py M be/src/exec/filter-context.cc M be/src/exec/kudu-scanner.cc M be/src/runtime/raw-value-ir.cc M be/src/runtime/raw-value.h M be/src/runtime/raw-value.inline.h M be/src/runtime/runtime-filter-bank.cc M be/src/runtime/runtime-filter-bank.h M be/src/runtime/runtime-filter-ir.cc M be/src/runtime/runtime-filter.h M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/bloom-filter-ir.cc M be/src/util/bloom-filter.cc M be/src/util/bloom-filter.h A be/src/util/bloom-filter.inline.h M be/src/util/debug-util.cc M be/src/util/debug-util.h M be/src/util/hash-util.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test M tests/query_test/test_runtime_filters.py 28 files changed, 834 insertions(+), 362 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/15683/5 -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 5 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. Patch Set 3: (3 comments) http://gerrit.cloudera.org:8080/#/c/15683/3/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: http://gerrit.cloudera.org:8080/#/c/15683/3/common/thrift/PlanNodes.thrift@124 PS3, Line 124: BLOOM_MIN_MAX = 2 > Never mind the question above 'In that case would the bloom filter be redun Thanks for your comments. I will try to add some heuristic in planner when assign filter to Kudu scanner node. By default, we assign both types of filter to Kudu. http://gerrit.cloudera.org:8080/#/c/15683/3/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test File testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test: http://gerrit.cloudera.org:8080/#/c/15683/3/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test@667 PS3, Line 667: # RUNTIME_FILTER_SCHEME_KUDU is set as BLOOM, Bloom filter is assigned > One additional useful test is with a SEMI-JOIN, since it is a common patter Will add the suggested case. http://gerrit.cloudera.org:8080/#/c/15683/3/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test@722 PS3, Line 722: RUNTIME_FILTER_SCHEME_KUDU=BLOOM_MIN_MAX > Suggest setting the explain_level=2 such that the runtime filter shows what Will change the explain_level=2 as suggested. -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 3 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 10 Apr 2020 18:59:39 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. Patch Set 3: Code-Review+1 (2 comments) Couple of suggestions about the tests, otherwise the planner side changes LGTM. I did look at the backend changes but would be better for someone else to review it too. http://gerrit.cloudera.org:8080/#/c/15683/3/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test File testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test: http://gerrit.cloudera.org:8080/#/c/15683/3/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test@667 PS3, Line 667: # RUNTIME_FILTER_SCHEME_KUDU is set as BLOOM, Bloom filter is assigned One additional useful test is with a SEMI-JOIN, since it is a common pattern : SELECT COUNT(*) FROM alltypes a WHERE a.id IN (SELECT b.id FROM alltypes b WHERE b.int_col < 10); http://gerrit.cloudera.org:8080/#/c/15683/3/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test@722 PS3, Line 722: RUNTIME_FILTER_SCHEME_KUDU=BLOOM_MIN_MAX Suggest setting the explain_level=2 such that the runtime filter shows what type of filter it is 'bloom' or 'min-max'. e.g RF000[bloom] -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 3 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 10 Apr 2020 18:22:49 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Aman Sinha has removed a vote on this change. Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. Removed Code-Review+1 by Aman Sinha -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 3 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/15683/3/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: http://gerrit.cloudera.org:8080/#/c/15683/3/common/thrift/PlanNodes.thrift@124 PS3, Line 124: BLOOM_MIN_MAX = 2 > Thanks Bankim, Todd for the.illustrative explanation. Agreed, for the (co Never mind the question above 'In that case would the bloom filter be redundant ?'. I must have been thinking of the FALSE case for the first predicate. -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 3 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 10 Apr 2020 14:49:34 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/15683/3/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: http://gerrit.cloudera.org:8080/#/c/15683/3/common/thrift/PlanNodes.thrift@124 PS3, Line 124: BLOOM_MIN_MAX = 2 > BTW forgot that I took some measurements of the ApplyPredicatePrimitive per Thanks Bankim, Todd for the.illustrative explanation. Agreed, for the (compound) primary keys, the min-max predicate is well suited for range pruning - that was my thought too in my initial comment. For #3, I didn't think it would provide much benefit but your numbers look compelling enough for the 'with predicate' case that it seems worth it. It sounds like Todd, you would even prefer evaluating that first. In that case, would the bloom filter be redundant ? It sounds like current behavior (from Bankim's comment) is the BF is evaluated first. In any case, yeah on further thought, the overhead of sending both is not too much and we could leave the decision making to Kudu. -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 3 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 10 Apr 2020 05:47:36 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. Patch Set 3: Sorry for repeated comments, but one more thought occurred me to me: the most effective case of min-max filter is when min == max, and thus the filter becomes an equality. An equality predicate can then serve to prune Kudu hash partitions, which isn't doable with bloom or range predicates. -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 3 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 10 Apr 2020 04:12:06 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/15683/3/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: http://gerrit.cloudera.org:8080/#/c/15683/3/common/thrift/PlanNodes.thrift@124 PS3, Line 124: BLOOM_MIN_MAX = 2 > Right, sortedness in kudu is based on the (compound) primary key. So, min-m BTW forgot that I took some measurements of the ApplyPredicatePrimitive performance bfeore: int8 NOT NULL (c >= 0 AND c < 2) 3152.2M evals/sec 0.88 cycles/eval int16 NOT NULL (c >= 0 AND c < 2) 3309.6M evals/sec 0.85 cycles/eval int32 NOT NULL (c >= 0 AND c < 2) 3384.0M evals/sec 0.85 cycles/eval int64 NOT NULL (c >= 0 AND c < 2) 1847.6M evals/sec 1.57 cycles/eval float NOT NULL (c >= 0 AND c < 2) 3268.3M evals/sec 0.88 cycles/eval double NOT NULL (c >= 0 AND c < 2) 2245.2M evals/sec 1.27 cycles/eval -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 3 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 10 Apr 2020 03:56:13 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/15683/3/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: http://gerrit.cloudera.org:8080/#/c/15683/3/common/thrift/PlanNodes.thrift@124 PS3, Line 124: BLOOM_MIN_MAX = 2 > This is my understanding, I'll check further within Kudu team. Right, sortedness in kudu is based on the (compound) primary key. So, min-max predicates can help in the following ways: (1) eliminate range partitions -- eg a min/max on a timestamp could would help a lot if the table is timestamp-partitioned by day, whereas a bloom filter on timestamp is probably not useful since cardinality is very high (2) convert to primary key-based range scans -- if we have inequality/equality predicates on a prefix of the primary key, we can convert those predicates into a range scan on the table using b-tree style indexes. Again if you had a min-max on timestamp and timestamp were the leading portion of your key, it's a big win. Or, if you had a min-max on timestamp, a composite primary key of (entity_id, timestamp), and an equality predicate on entity_id, it would still do the PK range scan conversion for a big win. (3) normal row-by-row evaluation - this is still useful since, particularly for primitive columns, it's very fast to evaluate (SIMD-accelerated in latest release, etc). It's likely much faster to evaluate than a bloom filter which will probably have some frequency of L1 cache misses if not L2/LLC, plus the hash calculation and mixing itself which likely takes a couple cycles. So, you're right that it may not eliminate much data in some queries, but the cost might be small enough that it's worth doing even if it eliminates 10-20%. To validate this I loaded 800M rows into a table using 'kudu perf loadgen' and then compared scanning with $ :./build/latest/bin/kudu perf table_scan localhost default.loadgen_auto_0b01a9eba6d54f2ba7fa5b59e3a597c5 --columns=int_val --num-threads=10 --predicates ["AND", [">=", "int_val", 0]] vs the same without the predicate. In this case the predicate matches all rows, but there's no special short-circuit logic, so it actually evaluates it everywhere. Wall clock times: with the predicate: 1.7023 +- 0.0266 seconds time elapsed ( +- 1.56% ) without the predicate: 1.6514 +- 0.0370 seconds time elapsed ( +- 2.24% ) Also running perf record -a, I can see that about 5% of the total CPU was used on the tserver in ApplyPredicatePrimitive. Admittedly, this overhead will go up on a percentage basis after some other in-flight work we have is finished (eg more CPU-efficient integer encoding bankim's working on) but I think we also have more room to optimize ApplyPredicatePrimitive by using AVX2 instead of just SSE4 instructions when available. As for the overhead of sending to the coordinator and broadcasting, isn't this a _tiny_ amount of data? ie it's only two values, which for primitives is going to be a max of 16 bytes each, and for varlen, you could feasibly truncate to a fixed size bound even in the case of a long string. So, I'm generally in favor of sending and evaluating both. In the best case it's a huge win (eliminating range partitions) and in the worst case it's probably not a major cost. One further thought here as a future extension: we could provide an API in the Kudu scanner that says "this predicate is optional: drop it if you think it's not filtering much". For the case of these propagated predicates feeding into a join, dropping them at runtime if they are ineffective is worthwhile (I think Impala already does something like this in its scanner path, right?) -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 3 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 10 Apr 2020 03:49:11 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/15683/3/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: http://gerrit.cloudera.org:8080/#/c/15683/3/common/thrift/PlanNodes.thrift@124 PS3, Line 124: BLOOM_MIN_MAX = 2 > Instead of 'are not sorted(partially sorted within each tablet)', I meant This is my understanding, I'll check further within Kudu team. When the predicate pushed to Kudu includes partition key columns then pruning using min-max filter will be done first eliminating bunch of files containing column values namely cfiles. When the predicate doesn't include partition key columns then all column values will need to be scanned and as per current implementation for every column value first Bloom filter will be checked before checking for range/min-max values[1]. So in case of non-partition key columns whether min-max filter will help depends upon whether min-max filter values are simply min and max of the values inserted in the Bloom filter or not derived from the values inserted in the Bloom filter. For the former case, I agree supplying min-max filter with Bloom filter won't help. [1] https://github.com/apache/kudu/blob/master/src/kudu/common/column_predicate.h#L332 -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 3 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 10 Apr 2020 03:23:19 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/15683/3/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: http://gerrit.cloudera.org:8080/#/c/15683/3/common/thrift/PlanNodes.thrift@124 PS3, Line 124: BLOOM_MIN_MAX = 2 > In the case of both Bloom and Min-Max filters, is the order of evaluation d Instead of 'are not sorted(partially sorted within each tablet)', I meant that even within a tablet there's no sortedness of that column. -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 3 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 10 Apr 2020 02:19:32 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/15683/3/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: http://gerrit.cloudera.org:8080/#/c/15683/3/common/thrift/PlanNodes.thrift@124 PS3, Line 124: BLOOM_MIN_MAX = 2 In the case of both Bloom and Min-Max filters, is the order of evaluation determined by Kudu ? For columns that are not sorted (partially sorted within each tablet), the Min-Max filters could potentially not eliminate much and be wasted effort, so Bloom filter should be evaluated first. On the other hand, for sorted columns, it would make sense to apply Min-Max filters first since Bloom filter has false positives. It seems that for a specific column, creating two types of filters would incur overhead during execution - since both have to be sent to coordinator and broadcast to executors. Any thoughts on that ? -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 3 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 10 Apr 2020 02:08:39 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. Patch Set 3: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/5749/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 3 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Wed, 08 Apr 2020 06:20:50 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Wenzhe Zhou has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu Defined the BloomFilter class as the wrapper of Kudu BlockBloomFilter, which build runtime bloom filter in Kudu BlockBloomFilter APIs with FastHash as default hash algorithm. Removed the duplicated functions from BloomFillter class. Added a query option to set runtime filter scheme for Kudu. Pushed down runtime filters to Kudu through Kudu client APIs. Added new test cases in PlannerTest and end-end runtime_filters test for pushing down bloom filter to Kudu. Testing: Passed end-end runtime filter tests with codegen disabled. Passed frontend Planner tests. Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 --- M be/CMakeLists.txt M be/src/exec/filter-context.cc M be/src/exec/kudu-scanner.cc M be/src/runtime/raw-value.h M be/src/runtime/runtime-filter-bank.cc M be/src/runtime/runtime-filter-bank.h M be/src/runtime/runtime-filter-ir.cc M be/src/runtime/runtime-filter.h M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/bloom-filter-ir.cc M be/src/util/bloom-filter.cc M be/src/util/bloom-filter.h A be/src/util/bloom-filter.inline.h M be/src/util/debug-util.cc M be/src/util/debug-util.h M be/src/util/hash-util.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test M tests/query_test/test_runtime_filters.py 24 files changed, 624 insertions(+), 360 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/15683/3 -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 3 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. Patch Set 2: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/5748/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 2 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Wed, 08 Apr 2020 05:08:06 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. Patch Set 2: (4 comments) http://gerrit.cloudera.org:8080/#/c/15683/2/be/src/exec/kudu-scanner.cc File be/src/exec/kudu-scanner.cc: http://gerrit.cloudera.org:8080/#/c/15683/2/be/src/exec/kudu-scanner.cc@269 PS2, Line 269: const ColumnType& col_type = ColumnType::FromThrift(target_desc.kudu_col_type); line too long (91 > 90) http://gerrit.cloudera.org:8080/#/c/15683/2/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java: http://gerrit.cloudera.org:8080/#/c/15683/2/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@713 PS2, Line 713: // Kudu only supports targeting a single column, not general exprs, so the target line too long (91 > 90) http://gerrit.cloudera.org:8080/#/c/15683/2/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@729 PS2, Line 729: // Kudu only supports targeting a single column, not general exprs, so the target line too long (91 > 90) http://gerrit.cloudera.org:8080/#/c/15683/2/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@732 PS2, Line 732: // Kudu also cannot currently return nulls if a filter is applied, so it does not line too long (91 > 90) -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 2 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Wed, 08 Apr 2020 04:50:56 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Wenzhe Zhou has uploaded this change for review. ( http://gerrit.cloudera.org:8080/15683 Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu Defined the BloomFilter class as the wrapper of Kudu BlockBloomFilter, which build runtime bloom filter in Kudu BlockBloomFilter APIs with FastHash as default hash algorithm. Removed the duplicated functions from BloomFillter class. Added a query option to set runtime filter scheme for Kudu. Pushed down runtime filters to Kudu through Kudu client APIs. Added new test cases in PlannerTest and end-end runtime_filters test for pushing down bloom filter to Kudu. Testing: Passed end-end runtime filter tests with codegen disabled. Passed frontend Planner tests. Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 --- M be/CMakeLists.txt M be/src/exec/filter-context.cc M be/src/exec/kudu-scanner.cc M be/src/runtime/raw-value.h M be/src/runtime/runtime-filter-bank.cc M be/src/runtime/runtime-filter-bank.h M be/src/runtime/runtime-filter-ir.cc M be/src/runtime/runtime-filter.h M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/bloom-filter-ir.cc M be/src/util/bloom-filter.cc M be/src/util/bloom-filter.h A be/src/util/bloom-filter.inline.h M be/src/util/debug-util.cc M be/src/util/debug-util.h M be/src/util/hash-util.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test M tests/query_test/test_runtime_filters.py 24 files changed, 623 insertions(+), 360 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/15683/2 -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 2 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Thomas Tauber-Marshall