Shant Hovsepian has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16499 )

Change subject: IMPALA-10112: Remove FpRateTooHigh() check for blom filter
......................................................................


Patch Set 1:

(4 comments)

Thanks for implementing and testing this out Riza!

http://gerrit.cloudera.org:8080/#/c/16499/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16499/1//COMMIT_MSG@24
PS1, Line 24: inacurate
nit: inaccurate


http://gerrit.cloudera.org:8080/#/c/16499/1//COMMIT_MSG@9
PS1, Line 9: This patch remove FpRateTooHigh() check for bloom filter that can
           : disable filter if the observed false-positive probability (FPP) 
rate is
           : higher than FLAGS_max_filter_error_rate. Such filter with high FPP 
rate
           : is still worth to evaluate for several reasons:
           :
           : 1. Partition filters are probably still worth evaluating even if 
there
           :    are false positives, because it's cheap and eliminating a 
partition
           :    is still beneficial.
           : 2. Runtime filters are dynamically disabled on the scan side if 
they are
           :    ineffective. An always true filter is also still being 
evaluated and
           :    not entirely free.
           : 3. The disabling is fairly unlikely to kick in for partitioned 
joins
           :    because it's only applied to a small subset of the filter, 
before the
           :    Or() operation.
           : 4. FpRateTooHigh() use num_build_rows to approximate actual FPP 
rate of
           :    resulting filter. This can be inacurate because it does not take
           :    account of duplicate values of the filter key on the build side.
           :
           : This patch also remove some tests in test_runtime_filters.py that 
check
           : cancellation of filters having high FPP rate.
nit: The grammar here is a little hard to parse, might want to run through it 
again or pass it through a grammar checker?


http://gerrit.cloudera.org:8080/#/c/16499/1//COMMIT_MSG@33
PS1, Line 33: little to no performance regression
If you have the performance numbers handy it would be good to include them in 
the commit message to aid any future readers; however it's not critical if 
rerunning the benchmark is required.


http://gerrit.cloudera.org:8080/#/c/16499/1/be/src/exec/partitioned-hash-join-builder.cc
File be/src/exec/partitioned-hash-join-builder.cc:

http://gerrit.cloudera.org:8080/#/c/16499/1/be/src/exec/partitioned-hash-join-builder.cc@a941
PS1, Line 941:
Is the always_true_filter dead code now or is it used elsewhere?



--
To view, visit http://gerrit.cloudera.org:8080/16499
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id9f8f40764b4f6664cc81b0da428afea8e3588d4
Gerrit-Change-Number: 16499
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto <riza.sumi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com>
Gerrit-Reviewer: Shant Hovsepian <sh...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Comment-Date: Thu, 24 Sep 2020 03:36:01 +0000
Gerrit-HasComments: Yes

Reply via email to