Wenzhe Zhou has uploaded a new patch set (#20). ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................

IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Defined the BloomFilter class as the wrapper of kudu::BlockBloomFilter.
impala::BloomFilter build runtime bloom filter in kudu::BlockBloomFilter
APIs with FastHash as default hash algorithm.
Removed the duplicated functions from impala::BloomFillter class.
Pushed down bloom filter to Kudu through Kudu clinet API.

Added a new query option ENABLED_RUNTIME_FILTER_TYPES to set enabled
runtime filter types, which only affect Kudu scan node now. By default,
bloom filter is not enabled, only min-max filter will be enabled for
Kudu. With this option, user could enable bloom filter, min-max filter,
or both bloom and min-max runtime filters.

Added new test cases in PlannerTest and end-end runtime_filters test
for pushing down bloom filter to Kudu.
Added test cases to compare the number of rows returned from Kudu
scan when appling different types of runtime filter on same queries.
Updated bloom-filter-benchmark due to the bloom-filter implementation
change.

Bump Kudu version to d652cab17.

Testing:
 - Passed all core tests.

Performance benchmark:
 - Ran single_node_perf_run.py on TPC-H with scale as 30 for parquet
   and Kudu. Verified that new hash function and bloom-filter
   implementation don't cause regressions for HDFS bloom filters.
   For Kudu, there is one regression for query TPCH-Q9 and there
   are improvement for about 8 queris when appling both bloom and
   min-max filters. The bloom filter reduce the number of rows
   returned from Kudu scan, hence reduce the cost for aggregation
   and hash join. But bloom filter evaluation add extra cost for
   Kudu scan, which offset the gain on aggregation and join.
   Kudu scan need to be optimized for bloom filter in following
   tasks.
 - Ran bloom-filter microbenchmarks and verified that there is no
   regression for Insert/Find/Union functions with or without AVX2
   due to bloom-filter implementation changes. There is small
   performance degration for Init function, but this function is not
   in hot path.

Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
---
M be/CMakeLists.txt
M be/src/benchmarks/bloom-filter-benchmark.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/filter-context.cc
M be/src/exec/kudu-scanner.cc
M be/src/kudu/util/block_bloom_filter.h
M be/src/runtime/raw-value-ir.cc
M be/src/runtime/raw-value.h
M be/src/runtime/raw-value.inline.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter-test.cc
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
M be/src/util/debug-util.cc
M be/src/util/debug-util.h
M be/src/util/hash-util.h
M bin/impala-config.sh
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
A testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test
A 
testdata/workloads/functional-query/queries/QueryTest/diff_runtime_filter_types.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test
M tests/query_test/test_runtime_filters.py
M tests/query_test/test_spilling.py
37 files changed, 1,523 insertions(+), 623 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/15683/20
--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 20
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ban...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jbap...@apache.org>
Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

Reply via email to