Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16499 )
Change subject: IMPALA-10112: Remove FpRateTooHigh() check for blom filter ...................................................................... Patch Set 1: (4 comments) Thanks for implementing and testing this out Riza! http://gerrit.cloudera.org:8080/#/c/16499/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16499/1//COMMIT_MSG@24 PS1, Line 24: inacurate nit: inaccurate http://gerrit.cloudera.org:8080/#/c/16499/1//COMMIT_MSG@9 PS1, Line 9: This patch remove FpRateTooHigh() check for bloom filter that can : disable filter if the observed false-positive probability (FPP) rate is : higher than FLAGS_max_filter_error_rate. Such filter with high FPP rate : is still worth to evaluate for several reasons: : : 1. Partition filters are probably still worth evaluating even if there : are false positives, because it's cheap and eliminating a partition : is still beneficial. : 2. Runtime filters are dynamically disabled on the scan side if they are : ineffective. An always true filter is also still being evaluated and : not entirely free. : 3. The disabling is fairly unlikely to kick in for partitioned joins : because it's only applied to a small subset of the filter, before the : Or() operation. : 4. FpRateTooHigh() use num_build_rows to approximate actual FPP rate of : resulting filter. This can be inacurate because it does not take : account of duplicate values of the filter key on the build side. : : This patch also remove some tests in test_runtime_filters.py that check : cancellation of filters having high FPP rate. nit: The grammar here is a little hard to parse, might want to run through it again or pass it through a grammar checker? http://gerrit.cloudera.org:8080/#/c/16499/1//COMMIT_MSG@33 PS1, Line 33: little to no performance regression If you have the performance numbers handy it would be good to include them in the commit message to aid any future readers; however it's not critical if rerunning the benchmark is required. http://gerrit.cloudera.org:8080/#/c/16499/1/be/src/exec/partitioned-hash-join-builder.cc File be/src/exec/partitioned-hash-join-builder.cc: http://gerrit.cloudera.org:8080/#/c/16499/1/be/src/exec/partitioned-hash-join-builder.cc@a941 PS1, Line 941: Is the always_true_filter dead code now or is it used elsewhere? -- To view, visit http://gerrit.cloudera.org:8080/16499 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id9f8f40764b4f6664cc81b0da428afea8e3588d4 Gerrit-Change-Number: 16499 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto <riza.sumi...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com> Gerrit-Reviewer: Shant Hovsepian <sh...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Comment-Date: Thu, 24 Sep 2020 03:36:01 +0000 Gerrit-HasComments: Yes