Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/16499 )
Change subject: IMPALA-10112: Remove FpRateTooHigh() check for bloom filter ...................................................................... Patch Set 2: (10 comments) Patch set 2 submitted. http://gerrit.cloudera.org:8080/#/c/16499/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16499/1//COMMIT_MSG@7 PS1, Line 7: bloo > bloom Done http://gerrit.cloudera.org:8080/#/c/16499/1//COMMIT_MSG@24 PS1, Line 24: > nit: inaccurate Done http://gerrit.cloudera.org:8080/#/c/16499/1//COMMIT_MSG@9 PS1, Line 9: FpRateTooHigh() check for bloom filter that can be disabled if the : observed false-positive probability (FPP) rate is higher than : FLAGS_max_filter_error_rate. This patch remove FpRateTooHigh() check for : several reasons: : : 1. Partition filters are probably still worth evaluating even if there : are false positives because it is cheap and beneficial to eliminate a : partition. : : 2. Runtime filters are dynamically disabled on the scan side if they are : ineffective. Even an ALWAYS_TRUE_FILTER that substitutes a disabled : filter is also still being evaluated and not entirely free. : : 3. The disabling is less likely to kick in for partitioned joins because : it only applied to a small subset of the filters before the Or() : operation. : : 4. FpRateTooHigh() use num_build_rows to approximate the FPP rate of the : resulting filter. This can be inaccurate because it does not take : account of duplicate values of the filter > nit: The grammar here is a little hard to parse, might want to run through Done http://gerrit.cloudera.org:8080/#/c/16499/1//COMMIT_MSG@33 PS1, Line 33: > If you have the performance numbers handy it would be good to include them We happen to revive our 30TB cluster yesterday, so I get to rerun the experiment with larger cluster. Numbers are attached now. http://gerrit.cloudera.org:8080/#/c/16499/1/be/src/exec/partitioned-hash-join-builder.cc File be/src/exec/partitioned-hash-join-builder.cc: http://gerrit.cloudera.org:8080/#/c/16499/1/be/src/exec/partitioned-hash-join-builder.cc@a941 PS1, Line 941: > Is the always_true_filter dead code now or is it used elsewhere? It is still being used in cases where an KRPC sidecar failure might happen such as in function RuntimeFilterBank::PublishGlobalFilter() http://gerrit.cloudera.org:8080/#/c/16499/1/be/src/exec/partitioned-hash-join-builder.cc@927 PS1, Line 927: for (const FilterContext& ctx : filter_ctxs_) { > Comment is out of date. Done http://gerrit.cloudera.org:8080/#/c/16499/1/be/src/exec/partitioned-hash-join-builder.cc@936 PS1, Line 936: > Comment is out of date Done http://gerrit.cloudera.org:8080/#/c/16499/1/testdata/workloads/functional-query/queries/QueryTest/bloom_filters.test File testdata/workloads/functional-query/queries/QueryTest/bloom_filters.test: http://gerrit.cloudera.org:8080/#/c/16499/1/testdata/workloads/functional-query/queries/QueryTest/bloom_filters.test@11 PS1, Line 11: #################################################### > You can remove the commented out test. It makes sense to leave the descript Done http://gerrit.cloudera.org:8080/#/c/16499/1/testdata/workloads/functional-query/queries/QueryTest/bloom_filters_wait.test File testdata/workloads/functional-query/queries/QueryTest/bloom_filters_wait.test: PS1: > Maybe just delete this file and the python test that uses it Done http://gerrit.cloudera.org:8080/#/c/16499/1/tests/query_test/test_runtime_filters.py File tests/query_test/test_runtime_filters.py: http://gerrit.cloudera.org:8080/#/c/16499/1/tests/query_test/test_runtime_filters.py@195 PS1, Line 195: > Just delete the test, no need to keep it around, we can always restore it f Done -- To view, visit http://gerrit.cloudera.org:8080/16499 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id9f8f40764b4f6664cc81b0da428afea8e3588d4 Gerrit-Change-Number: 16499 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Shant Hovsepian <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Thu, 24 Sep 2020 17:41:39 +0000 Gerrit-HasComments: Yes
