Hello Impala Public Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/18141
to look at the new patch set (#2).
Change subject: WIP IMPALA-10898: Add runtime IN-list filters for ORC tables
......................................................................
WIP IMPALA-10898: Add runtime IN-list filters for ORC tables
Currently Impala has two kinds of runtime filters: bloom filter and
min-max filter. Unfortunately they can't leverage the bloom filters in
ORC files. Because only EQUALS and IN-list predicates can leverage them
to skip unrelated ORC RowGroups, and we can't convert runtime bloom
filters or min-max filters into such predicates.
This patch adds runtime IN-list filters for small build side (e.g. #rows
<= 1024) of a broadcast join. Currently the IN-list filters will only
apply to ORC tables and be pushed down to the ORC reader(i.e. ORC lib).
Evaluating runtime IN-list filters is much slower than evaluating
runtime bloom filters due to the current simple implementation (i.e.
std::unorder_set). So we disable it at row level.
Example query that will benefit from this patch:
use tpch_orc_def;
select count(*) from lineitem_bf join (
select * from partsupp, part
where ps_partkey = p_partkey and p_size = 15
and p_type like '%BRASS' and ps_availqty < 10) v
on l_partkey = ps_partkey and l_suppkey = ps_suppkey;
The inline-view populates a runtime IN-list filter of 4 items. Note that
we need to re-generate the lineitem table with bloom filters enabled
(e.g. setting orc.bloom.filter.columns to
"l_orderkey,l_partkey,l_suppkey,l_linenumber,l_quantity" in
tblproperties), so the pushed down IN-list filter can have a better
filter rate.
TODO: fix tests due to plan changes.
Change-Id: I25080628233799aa0b6be18d5a832f1385414501
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/join-builder.cc
M be/src/exec/nested-loop-join-builder.h
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exec/scan-node.cc
M be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-bank.h
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter-test.cc
M be/src/runtime/runtime-filter.cc
M be/src/runtime/runtime-filter.h
M be/src/runtime/runtime-filter.inline.h
M be/src/service/data-stream-service.cc
M be/src/service/query-options-test.cc
M be/src/util/CMakeLists.txt
A be/src/util/in-list-filter-ir.cc
A be/src/util/in-list-filter.cc
A be/src/util/in-list-filter.h
M common/protobuf/data_stream_service.proto
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M tests/query_test/test_runtime_filters.py
30 files changed, 752 insertions(+), 120 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/18141/2
--
To view, visit http://gerrit.cloudera.org:8080/18141
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I25080628233799aa0b6be18d5a832f1385414501
Gerrit-Change-Number: 18141
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>