Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16499 )

Change subject: IMPALA-10112: Remove FpRateTooHigh() check for bloom filter
......................................................................


Patch Set 2:

(10 comments)

Patch set 2 submitted.

http://gerrit.cloudera.org:8080/#/c/16499/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16499/1//COMMIT_MSG@7
PS1, Line 7: bloo
> bloom
Done


http://gerrit.cloudera.org:8080/#/c/16499/1//COMMIT_MSG@24
PS1, Line 24:
> nit: inaccurate
Done


http://gerrit.cloudera.org:8080/#/c/16499/1//COMMIT_MSG@9
PS1, Line 9: FpRateTooHigh() check for bloom filter that can be disabled if the
           : observed false-positive probability (FPP) rate is higher than
           : FLAGS_max_filter_error_rate. This patch remove FpRateTooHigh() 
check for
           : several reasons:
           :
           : 1. Partition filters are probably still worth evaluating even if 
there
           :    are false positives because it is cheap and beneficial to 
eliminate a
           :    partition.
           :
           : 2. Runtime filters are dynamically disabled on the scan side if 
they are
           :    ineffective. Even an ALWAYS_TRUE_FILTER that substitutes a 
disabled
           :    filter is also still being evaluated and not entirely free.
           :
           : 3. The disabling is less likely to kick in for partitioned joins 
because
           :    it only applied to a small subset of the filters before the Or()
           :    operation.
           :
           : 4. FpRateTooHigh() use num_build_rows to approximate the FPP rate 
of the
           :    resulting filter. This can be inaccurate because it does not 
take
           :    account of duplicate values of the filter
> nit: The grammar here is a little hard to parse, might want to run through
Done


http://gerrit.cloudera.org:8080/#/c/16499/1//COMMIT_MSG@33
PS1, Line 33:
> If you have the performance numbers handy it would be good to include them
We happen to revive our 30TB cluster yesterday, so I get to rerun the 
experiment with larger cluster. Numbers are attached now.


http://gerrit.cloudera.org:8080/#/c/16499/1/be/src/exec/partitioned-hash-join-builder.cc
File be/src/exec/partitioned-hash-join-builder.cc:

http://gerrit.cloudera.org:8080/#/c/16499/1/be/src/exec/partitioned-hash-join-builder.cc@a941
PS1, Line 941:
> Is the always_true_filter dead code now or is it used elsewhere?
It is still being used in cases where an KRPC sidecar failure might happen such 
as in function RuntimeFilterBank::PublishGlobalFilter()


http://gerrit.cloudera.org:8080/#/c/16499/1/be/src/exec/partitioned-hash-join-builder.cc@927
PS1, Line 927:   for (const FilterContext& ctx : filter_ctxs_) {
> Comment is out of date.
Done


http://gerrit.cloudera.org:8080/#/c/16499/1/be/src/exec/partitioned-hash-join-builder.cc@936
PS1, Line 936: 
> Comment is out of date
Done


http://gerrit.cloudera.org:8080/#/c/16499/1/testdata/workloads/functional-query/queries/QueryTest/bloom_filters.test
File testdata/workloads/functional-query/queries/QueryTest/bloom_filters.test:

http://gerrit.cloudera.org:8080/#/c/16499/1/testdata/workloads/functional-query/queries/QueryTest/bloom_filters.test@11
PS1, Line 11: ####################################################
> You can remove the commented out test. It makes sense to leave the descript
Done


http://gerrit.cloudera.org:8080/#/c/16499/1/testdata/workloads/functional-query/queries/QueryTest/bloom_filters_wait.test
File 
testdata/workloads/functional-query/queries/QueryTest/bloom_filters_wait.test:

PS1:
> Maybe just delete this file and the python test that uses it
Done


http://gerrit.cloudera.org:8080/#/c/16499/1/tests/query_test/test_runtime_filters.py
File tests/query_test/test_runtime_filters.py:

http://gerrit.cloudera.org:8080/#/c/16499/1/tests/query_test/test_runtime_filters.py@195
PS1, Line 195:
> Just delete the test, no need to keep it around, we can always restore it f
Done



--
To view, visit http://gerrit.cloudera.org:8080/16499
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id9f8f40764b4f6664cc81b0da428afea8e3588d4
Gerrit-Change-Number: 16499
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Shant Hovsepian <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Thu, 24 Sep 2020 17:41:39 +0000
Gerrit-HasComments: Yes

Reply via email to