Riza Suminto has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16499
Change subject: IMPALA-10112: Remove FpRateTooHigh() check for blom filter ...................................................................... IMPALA-10112: Remove FpRateTooHigh() check for blom filter This patch remove FpRateTooHigh() check for bloom filter that can disable filter if the observed false-positive probability (FPP) rate is higher than FLAGS_max_filter_error_rate. Such filter with high FPP rate is still worth to evaluate for several reasons: 1. Partition filters are probably still worth evaluating even if there are false positives, because it's cheap and eliminating a partition is still beneficial. 2. Runtime filters are dynamically disabled on the scan side if they are ineffective. An always true filter is also still being evaluated and not entirely free. 3. The disabling is fairly unlikely to kick in for partitioned joins because it's only applied to a small subset of the filter, before the Or() operation. 4. FpRateTooHigh() use num_build_rows to approximate actual FPP rate of resulting filter. This can be inacurate because it does not take account of duplicate values of the filter key on the build side. This patch also remove some tests in test_runtime_filters.py that check cancellation of filters having high FPP rate. Testing: - Run and pass core tests. - Manually test and verify in real large cluster (TPC-DS 10TB scale) that there is only a little to no performance regression incurred from the removal of high FPP check. TPC-DS queries used to test are Q14a, Q50, Q64, Q71, Q84, Q93, and modification of Q93 where we replace the left outer join with inner join. Change-Id: Id9f8f40764b4f6664cc81b0da428afea8e3588d4 --- M be/src/exec/partitioned-hash-join-builder.cc M be/src/runtime/runtime-filter-bank.cc M be/src/runtime/runtime-filter-bank.h M testdata/workloads/functional-query/queries/QueryTest/bloom_filters.test M testdata/workloads/functional-query/queries/QueryTest/bloom_filters_wait.test M tests/query_test/test_runtime_filters.py 6 files changed, 44 insertions(+), 57 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/16499/1 -- To view, visit http://gerrit.cloudera.org:8080/16499 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Id9f8f40764b4f6664cc81b0da428afea8e3588d4 Gerrit-Change-Number: 16499 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto <[email protected]>
