Wenzhe Zhou has uploaded a new patch set (#20). ( http://gerrit.cloudera.org:8080/15683 )
Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu ...................................................................... IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu Defined the BloomFilter class as the wrapper of kudu::BlockBloomFilter. impala::BloomFilter build runtime bloom filter in kudu::BlockBloomFilter APIs with FastHash as default hash algorithm. Removed the duplicated functions from impala::BloomFillter class. Pushed down bloom filter to Kudu through Kudu clinet API. Added a new query option ENABLED_RUNTIME_FILTER_TYPES to set enabled runtime filter types, which only affect Kudu scan node now. By default, bloom filter is not enabled, only min-max filter will be enabled for Kudu. With this option, user could enable bloom filter, min-max filter, or both bloom and min-max runtime filters. Added new test cases in PlannerTest and end-end runtime_filters test for pushing down bloom filter to Kudu. Added test cases to compare the number of rows returned from Kudu scan when appling different types of runtime filter on same queries. Updated bloom-filter-benchmark due to the bloom-filter implementation change. Bump Kudu version to d652cab17. Testing: - Passed all core tests. Performance benchmark: - Ran single_node_perf_run.py on TPC-H with scale as 30 for parquet and Kudu. Verified that new hash function and bloom-filter implementation don't cause regressions for HDFS bloom filters. For Kudu, there is one regression for query TPCH-Q9 and there are improvement for about 8 queris when appling both bloom and min-max filters. The bloom filter reduce the number of rows returned from Kudu scan, hence reduce the cost for aggregation and hash join. But bloom filter evaluation add extra cost for Kudu scan, which offset the gain on aggregation and join. Kudu scan need to be optimized for bloom filter in following tasks. - Ran bloom-filter microbenchmarks and verified that there is no regression for Insert/Find/Union functions with or without AVX2 due to bloom-filter implementation changes. There is small performance degration for Init function, but this function is not in hot path. Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 --- M be/CMakeLists.txt M be/src/benchmarks/bloom-filter-benchmark.cc M be/src/codegen/gen_ir_descriptions.py M be/src/exec/filter-context.cc M be/src/exec/kudu-scanner.cc M be/src/kudu/util/block_bloom_filter.h M be/src/runtime/raw-value-ir.cc M be/src/runtime/raw-value.h M be/src/runtime/raw-value.inline.h M be/src/runtime/runtime-filter-bank.cc M be/src/runtime/runtime-filter-ir.cc M be/src/runtime/runtime-filter.h M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/bloom-filter-ir.cc M be/src/util/bloom-filter-test.cc M be/src/util/bloom-filter.cc M be/src/util/bloom-filter.h M be/src/util/debug-util.cc M be/src/util/debug-util.h M be/src/util/hash-util.h M bin/impala-config.sh M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test A testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test A testdata/workloads/functional-query/queries/QueryTest/diff_runtime_filter_types.test M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test M tests/query_test/test_runtime_filters.py M tests/query_test/test_spilling.py 37 files changed, 1,523 insertions(+), 623 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/15683/20 -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 20 Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com> Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com> Gerrit-Reviewer: Bankim Bhavsar <ban...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Jim Apple <jbap...@apache.org> Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>