[Impala-ASF-CR] IMPALA-10112: Remove FpRateTooHigh() check for blom filter

2020-09-23 Thread Shant Hovsepian (Code Review)
Shant Hovsepian has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16499 )

Change subject: IMPALA-10112: Remove FpRateTooHigh() check for blom filter
..


Patch Set 1:

(4 comments)

Thanks for implementing and testing this out Riza!

http://gerrit.cloudera.org:8080/#/c/16499/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16499/1//COMMIT_MSG@24
PS1, Line 24: inacurate
nit: inaccurate


http://gerrit.cloudera.org:8080/#/c/16499/1//COMMIT_MSG@9
PS1, Line 9: This patch remove FpRateTooHigh() check for bloom filter that can
   : disable filter if the observed false-positive probability (FPP) 
rate is
   : higher than FLAGS_max_filter_error_rate. Such filter with high FPP 
rate
   : is still worth to evaluate for several reasons:
   :
   : 1. Partition filters are probably still worth evaluating even if 
there
   :are false positives, because it's cheap and eliminating a 
partition
   :is still beneficial.
   : 2. Runtime filters are dynamically disabled on the scan side if 
they are
   :ineffective. An always true filter is also still being 
evaluated and
   :not entirely free.
   : 3. The disabling is fairly unlikely to kick in for partitioned 
joins
   :because it's only applied to a small subset of the filter, 
before the
   :Or() operation.
   : 4. FpRateTooHigh() use num_build_rows to approximate actual FPP 
rate of
   :resulting filter. This can be inacurate because it does not take
   :account of duplicate values of the filter key on the build side.
   :
   : This patch also remove some tests in test_runtime_filters.py that 
check
   : cancellation of filters having high FPP rate.
nit: The grammar here is a little hard to parse, might want to run through it 
again or pass it through a grammar checker?


http://gerrit.cloudera.org:8080/#/c/16499/1//COMMIT_MSG@33
PS1, Line 33: little to no performance regression
If you have the performance numbers handy it would be good to include them in 
the commit message to aid any future readers; however it's not critical if 
rerunning the benchmark is required.


http://gerrit.cloudera.org:8080/#/c/16499/1/be/src/exec/partitioned-hash-join-builder.cc
File be/src/exec/partitioned-hash-join-builder.cc:

http://gerrit.cloudera.org:8080/#/c/16499/1/be/src/exec/partitioned-hash-join-builder.cc@a941
PS1, Line 941:
Is the always_true_filter dead code now or is it used elsewhere?



--
To view, visit http://gerrit.cloudera.org:8080/16499
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id9f8f40764b4f6664cc81b0da428afea8e3588d4
Gerrit-Change-Number: 16499
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 24 Sep 2020 03:36:01 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10112: Remove FpRateTooHigh() check for blom filter

2020-09-23 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16499 )

Change subject: IMPALA-10112: Remove FpRateTooHigh() check for blom filter
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16499/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16499/1//COMMIT_MSG@7
PS1, Line 7: blom
bloom



--
To view, visit http://gerrit.cloudera.org:8080/16499
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id9f8f40764b4f6664cc81b0da428afea8e3588d4
Gerrit-Change-Number: 16499
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 24 Sep 2020 02:03:34 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10112: Remove FpRateTooHigh() check for blom filter

2020-09-23 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16499 )

Change subject: IMPALA-10112: Remove FpRateTooHigh() check for blom filter
..


Patch Set 1:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/16499/1/be/src/exec/partitioned-hash-join-builder.cc
File be/src/exec/partitioned-hash-join-builder.cc:

http://gerrit.cloudera.org:8080/#/c/16499/1/be/src/exec/partitioned-hash-join-builder.cc@927
PS1, Line 927:   // Use 'num_build_rows' to estimate FP-rate of each Bloom 
filter, and publish
Comment is out of date.


http://gerrit.cloudera.org:8080/#/c/16499/1/be/src/exec/partitioned-hash-join-builder.cc@936
PS1, Line 936: // TODO: Consider checking this every few batches or so.
Comment is out of date


http://gerrit.cloudera.org:8080/#/c/16499/1/testdata/workloads/functional-query/queries/QueryTest/bloom_filters.test
File testdata/workloads/functional-query/queries/QueryTest/bloom_filters.test:

http://gerrit.cloudera.org:8080/#/c/16499/1/testdata/workloads/functional-query/queries/QueryTest/bloom_filters.test@11
PS1, Line 11: #SET RUNTIME_FILTER_WAIT_TIME_MS=3;
You can remove the commented out test. It makes sense to leave the description 
of the test case about as a tombstone for the test, but I don't think we need 
the result of it.


http://gerrit.cloudera.org:8080/#/c/16499/1/testdata/workloads/functional-query/queries/QueryTest/bloom_filters_wait.test
File 
testdata/workloads/functional-query/queries/QueryTest/bloom_filters_wait.test:

PS1:
Maybe just delete this file and the python test that uses it


http://gerrit.cloudera.org:8080/#/c/16499/1/tests/query_test/test_runtime_filters.py
File tests/query_test/test_runtime_filters.py:

http://gerrit.cloudera.org:8080/#/c/16499/1/tests/query_test/test_runtime_filters.py@195
PS1, Line 195:   # IMPALA-10112: This test is disabled because high FP rate 
check has been removed
Just delete the test, no need to keep it around, we can always restore it from 
version control.



--
To view, visit http://gerrit.cloudera.org:8080/16499
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id9f8f40764b4f6664cc81b0da428afea8e3588d4
Gerrit-Change-Number: 16499
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 24 Sep 2020 01:54:41 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10112: Remove FpRateTooHigh() check for blom filter

2020-09-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16499 )

Change subject: IMPALA-10112: Remove FpRateTooHigh() check for blom filter
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7259/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16499
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id9f8f40764b4f6664cc81b0da428afea8e3588d4
Gerrit-Change-Number: 16499
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 23 Sep 2020 21:32:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10112: Remove FpRateTooHigh() check for blom filter

2020-09-23 Thread Riza Suminto (Code Review)
Riza Suminto has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/16499


Change subject: IMPALA-10112: Remove FpRateTooHigh() check for blom filter
..

IMPALA-10112: Remove FpRateTooHigh() check for blom filter

This patch remove FpRateTooHigh() check for bloom filter that can
disable filter if the observed false-positive probability (FPP) rate is
higher than FLAGS_max_filter_error_rate. Such filter with high FPP rate
is still worth to evaluate for several reasons:

1. Partition filters are probably still worth evaluating even if there
   are false positives, because it's cheap and eliminating a partition
   is still beneficial.
2. Runtime filters are dynamically disabled on the scan side if they are
   ineffective. An always true filter is also still being evaluated and
   not entirely free.
3. The disabling is fairly unlikely to kick in for partitioned joins
   because it's only applied to a small subset of the filter, before the
   Or() operation.
4. FpRateTooHigh() use num_build_rows to approximate actual FPP rate of
   resulting filter. This can be inacurate because it does not take
   account of duplicate values of the filter key on the build side.

This patch also remove some tests in test_runtime_filters.py that check
cancellation of filters having high FPP rate.

Testing:
- Run and pass core tests.
- Manually test and verify in real large cluster (TPC-DS 10TB scale)
  that there is only a little to no performance regression incurred from
  the removal of high FPP check. TPC-DS queries used to test are Q14a,
  Q50, Q64, Q71, Q84, Q93, and modification of Q93 where we replace the
  left outer join with inner join.

Change-Id: Id9f8f40764b4f6664cc81b0da428afea8e3588d4
---
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-bank.h
M testdata/workloads/functional-query/queries/QueryTest/bloom_filters.test
M testdata/workloads/functional-query/queries/QueryTest/bloom_filters_wait.test
M tests/query_test/test_runtime_filters.py
6 files changed, 44 insertions(+), 57 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/16499/1
--
To view, visit http://gerrit.cloudera.org:8080/16499
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Id9f8f40764b4f6664cc81b0da428afea8e3588d4
Gerrit-Change-Number: 16499
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto