Riza Suminto has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/16499


Change subject: IMPALA-10112: Remove FpRateTooHigh() check for blom filter
......................................................................

IMPALA-10112: Remove FpRateTooHigh() check for blom filter

This patch remove FpRateTooHigh() check for bloom filter that can
disable filter if the observed false-positive probability (FPP) rate is
higher than FLAGS_max_filter_error_rate. Such filter with high FPP rate
is still worth to evaluate for several reasons:

1. Partition filters are probably still worth evaluating even if there
   are false positives, because it's cheap and eliminating a partition
   is still beneficial.
2. Runtime filters are dynamically disabled on the scan side if they are
   ineffective. An always true filter is also still being evaluated and
   not entirely free.
3. The disabling is fairly unlikely to kick in for partitioned joins
   because it's only applied to a small subset of the filter, before the
   Or() operation.
4. FpRateTooHigh() use num_build_rows to approximate actual FPP rate of
   resulting filter. This can be inacurate because it does not take
   account of duplicate values of the filter key on the build side.

This patch also remove some tests in test_runtime_filters.py that check
cancellation of filters having high FPP rate.

Testing:
- Run and pass core tests.
- Manually test and verify in real large cluster (TPC-DS 10TB scale)
  that there is only a little to no performance regression incurred from
  the removal of high FPP check. TPC-DS queries used to test are Q14a,
  Q50, Q64, Q71, Q84, Q93, and modification of Q93 where we replace the
  left outer join with inner join.

Change-Id: Id9f8f40764b4f6664cc81b0da428afea8e3588d4
---
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-bank.h
M testdata/workloads/functional-query/queries/QueryTest/bloom_filters.test
M testdata/workloads/functional-query/queries/QueryTest/bloom_filters_wait.test
M tests/query_test/test_runtime_filters.py
6 files changed, 44 insertions(+), 57 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/16499/1
--
To view, visit http://gerrit.cloudera.org:8080/16499
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Id9f8f40764b4f6664cc81b0da428afea8e3588d4
Gerrit-Change-Number: 16499
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto <[email protected]>

Reply via email to