[Impala-ASF-CR] IMPALA-10112: Remove FpRateTooHigh() check for blom filter
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16499 ) Change subject: IMPALA-10112: Remove FpRateTooHigh() check for blom filter .. Patch Set 1: (4 comments) Thanks for implementing and testing this out Riza! http://gerrit.cloudera.org:8080/#/c/16499/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16499/1//COMMIT_MSG@24 PS1, Line 24: inacurate nit: inaccurate http://gerrit.cloudera.org:8080/#/c/16499/1//COMMIT_MSG@9 PS1, Line 9: This patch remove FpRateTooHigh() check for bloom filter that can : disable filter if the observed false-positive probability (FPP) rate is : higher than FLAGS_max_filter_error_rate. Such filter with high FPP rate : is still worth to evaluate for several reasons: : : 1. Partition filters are probably still worth evaluating even if there :are false positives, because it's cheap and eliminating a partition :is still beneficial. : 2. Runtime filters are dynamically disabled on the scan side if they are :ineffective. An always true filter is also still being evaluated and :not entirely free. : 3. The disabling is fairly unlikely to kick in for partitioned joins :because it's only applied to a small subset of the filter, before the :Or() operation. : 4. FpRateTooHigh() use num_build_rows to approximate actual FPP rate of :resulting filter. This can be inacurate because it does not take :account of duplicate values of the filter key on the build side. : : This patch also remove some tests in test_runtime_filters.py that check : cancellation of filters having high FPP rate. nit: The grammar here is a little hard to parse, might want to run through it again or pass it through a grammar checker? http://gerrit.cloudera.org:8080/#/c/16499/1//COMMIT_MSG@33 PS1, Line 33: little to no performance regression If you have the performance numbers handy it would be good to include them in the commit message to aid any future readers; however it's not critical if rerunning the benchmark is required. http://gerrit.cloudera.org:8080/#/c/16499/1/be/src/exec/partitioned-hash-join-builder.cc File be/src/exec/partitioned-hash-join-builder.cc: http://gerrit.cloudera.org:8080/#/c/16499/1/be/src/exec/partitioned-hash-join-builder.cc@a941 PS1, Line 941: Is the always_true_filter dead code now or is it used elsewhere? -- To view, visit http://gerrit.cloudera.org:8080/16499 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id9f8f40764b4f6664cc81b0da428afea8e3588d4 Gerrit-Change-Number: 16499 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 24 Sep 2020 03:36:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10112: Remove FpRateTooHigh() check for blom filter
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16499 ) Change subject: IMPALA-10112: Remove FpRateTooHigh() check for blom filter .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/16499/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16499/1//COMMIT_MSG@7 PS1, Line 7: blom bloom -- To view, visit http://gerrit.cloudera.org:8080/16499 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id9f8f40764b4f6664cc81b0da428afea8e3588d4 Gerrit-Change-Number: 16499 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 24 Sep 2020 02:03:34 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10112: Remove FpRateTooHigh() check for blom filter
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16499 ) Change subject: IMPALA-10112: Remove FpRateTooHigh() check for blom filter .. Patch Set 1: (5 comments) http://gerrit.cloudera.org:8080/#/c/16499/1/be/src/exec/partitioned-hash-join-builder.cc File be/src/exec/partitioned-hash-join-builder.cc: http://gerrit.cloudera.org:8080/#/c/16499/1/be/src/exec/partitioned-hash-join-builder.cc@927 PS1, Line 927: // Use 'num_build_rows' to estimate FP-rate of each Bloom filter, and publish Comment is out of date. http://gerrit.cloudera.org:8080/#/c/16499/1/be/src/exec/partitioned-hash-join-builder.cc@936 PS1, Line 936: // TODO: Consider checking this every few batches or so. Comment is out of date http://gerrit.cloudera.org:8080/#/c/16499/1/testdata/workloads/functional-query/queries/QueryTest/bloom_filters.test File testdata/workloads/functional-query/queries/QueryTest/bloom_filters.test: http://gerrit.cloudera.org:8080/#/c/16499/1/testdata/workloads/functional-query/queries/QueryTest/bloom_filters.test@11 PS1, Line 11: #SET RUNTIME_FILTER_WAIT_TIME_MS=3; You can remove the commented out test. It makes sense to leave the description of the test case about as a tombstone for the test, but I don't think we need the result of it. http://gerrit.cloudera.org:8080/#/c/16499/1/testdata/workloads/functional-query/queries/QueryTest/bloom_filters_wait.test File testdata/workloads/functional-query/queries/QueryTest/bloom_filters_wait.test: PS1: Maybe just delete this file and the python test that uses it http://gerrit.cloudera.org:8080/#/c/16499/1/tests/query_test/test_runtime_filters.py File tests/query_test/test_runtime_filters.py: http://gerrit.cloudera.org:8080/#/c/16499/1/tests/query_test/test_runtime_filters.py@195 PS1, Line 195: # IMPALA-10112: This test is disabled because high FP rate check has been removed Just delete the test, no need to keep it around, we can always restore it from version control. -- To view, visit http://gerrit.cloudera.org:8080/16499 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id9f8f40764b4f6664cc81b0da428afea8e3588d4 Gerrit-Change-Number: 16499 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 24 Sep 2020 01:54:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10112: Remove FpRateTooHigh() check for blom filter
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16499 ) Change subject: IMPALA-10112: Remove FpRateTooHigh() check for blom filter .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7259/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16499 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id9f8f40764b4f6664cc81b0da428afea8e3588d4 Gerrit-Change-Number: 16499 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 23 Sep 2020 21:32:09 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10112: Remove FpRateTooHigh() check for blom filter
Riza Suminto has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16499 Change subject: IMPALA-10112: Remove FpRateTooHigh() check for blom filter .. IMPALA-10112: Remove FpRateTooHigh() check for blom filter This patch remove FpRateTooHigh() check for bloom filter that can disable filter if the observed false-positive probability (FPP) rate is higher than FLAGS_max_filter_error_rate. Such filter with high FPP rate is still worth to evaluate for several reasons: 1. Partition filters are probably still worth evaluating even if there are false positives, because it's cheap and eliminating a partition is still beneficial. 2. Runtime filters are dynamically disabled on the scan side if they are ineffective. An always true filter is also still being evaluated and not entirely free. 3. The disabling is fairly unlikely to kick in for partitioned joins because it's only applied to a small subset of the filter, before the Or() operation. 4. FpRateTooHigh() use num_build_rows to approximate actual FPP rate of resulting filter. This can be inacurate because it does not take account of duplicate values of the filter key on the build side. This patch also remove some tests in test_runtime_filters.py that check cancellation of filters having high FPP rate. Testing: - Run and pass core tests. - Manually test and verify in real large cluster (TPC-DS 10TB scale) that there is only a little to no performance regression incurred from the removal of high FPP check. TPC-DS queries used to test are Q14a, Q50, Q64, Q71, Q84, Q93, and modification of Q93 where we replace the left outer join with inner join. Change-Id: Id9f8f40764b4f6664cc81b0da428afea8e3588d4 --- M be/src/exec/partitioned-hash-join-builder.cc M be/src/runtime/runtime-filter-bank.cc M be/src/runtime/runtime-filter-bank.h M testdata/workloads/functional-query/queries/QueryTest/bloom_filters.test M testdata/workloads/functional-query/queries/QueryTest/bloom_filters_wait.test M tests/query_test/test_runtime_filters.py 6 files changed, 44 insertions(+), 57 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/16499/1 -- To view, visit http://gerrit.cloudera.org:8080/16499 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Id9f8f40764b4f6664cc81b0da428afea8e3588d4 Gerrit-Change-Number: 16499 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto