[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-06-05 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..


Patch Set 26:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6224/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 26
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 05 Jun 2020 18:34:07 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-06-05 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..

IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Defined the BloomFilter class as the wrapper of kudu::BlockBloomFilter.
impala::BloomFilter build runtime bloom filter in kudu::BlockBloomFilter
APIs with FastHash as default hash algorithm.
Removed the duplicated functions from impala::BloomFillter class.
Pushed down bloom filter to Kudu through Kudu clinet API.

Added a new query option ENABLED_RUNTIME_FILTER_TYPES to set enabled
runtime filter types, which only affect Kudu scan node now. By default,
bloom filter is not enabled, only min-max filter will be enabled for
Kudu. With this option, user could enable bloom filter, min-max filter,
or both bloom and min-max runtime filters.

Added new test cases in PlannerTest and end-end runtime_filters test
for pushing down bloom filter to Kudu.
Added test cases to compare the number of rows returned from Kudu
scan when appling different types of runtime filter on same queries.
Updated bloom-filter-benchmark due to the bloom-filter implementation
change.

Bump Kudu version to d652cab17.

Testing:
 - Passed all exhaustive tests.

Performance benchmark:
 - Ran single_node_perf_run.py on TPC-H with scale as 30 for parquet
   and Kudu. Verified that new hash function and bloom-filter
   implementation don't cause regressions for HDFS bloom filters.
   For Kudu, there is one regression for query TPCH-Q9 and there
   are improvement for about 8 queris when appling both bloom and
   min-max filters. The bloom filter reduce the number of rows
   returned from Kudu scan, hence reduce the cost for aggregation
   and hash join. But bloom filter evaluation add extra cost for
   Kudu scan, which offset the gain on aggregation and join.
   Kudu scan need to be optimized for bloom filter in following
   tasks.
 - Ran bloom-filter microbenchmarks and verified that there is no
   regression for Insert/Find/Union functions with or without AVX2
   due to bloom-filter implementation changes. There is small
   performance degradation for Init function, but this function is
   not in hot path.

Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Reviewed-on: http://gerrit.cloudera.org:8080/15683
Reviewed-by: Thomas Tauber-Marshall 
Tested-by: Impala Public Jenkins 
---
M be/CMakeLists.txt
M be/src/benchmarks/bloom-filter-benchmark.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/filter-context.cc
M be/src/exec/kudu-scanner.cc
M be/src/runtime/raw-value-ir.cc
M be/src/runtime/raw-value.h
M be/src/runtime/raw-value.inline.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter-test.cc
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
M be/src/util/debug-util.cc
M be/src/util/debug-util.h
M bin/impala-config.sh
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
A testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test
A 
testdata/workloads/functional-query/queries/QueryTest/diff_runtime_filter_types.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test
M tests/query_test/test_runtime_filters.py
M tests/query_test/test_spilling.py
36 files changed, 2,024 insertions(+), 661 deletions(-)

Approvals:
  Thomas Tauber-Marshall: Looks good to me, approved
  Impala Public Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 27
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-06-05 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has uploaded a new patch set (#26). ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..

IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Defined the BloomFilter class as the wrapper of kudu::BlockBloomFilter.
impala::BloomFilter build runtime bloom filter in kudu::BlockBloomFilter
APIs with FastHash as default hash algorithm.
Removed the duplicated functions from impala::BloomFillter class.
Pushed down bloom filter to Kudu through Kudu clinet API.

Added a new query option ENABLED_RUNTIME_FILTER_TYPES to set enabled
runtime filter types, which only affect Kudu scan node now. By default,
bloom filter is not enabled, only min-max filter will be enabled for
Kudu. With this option, user could enable bloom filter, min-max filter,
or both bloom and min-max runtime filters.

Added new test cases in PlannerTest and end-end runtime_filters test
for pushing down bloom filter to Kudu.
Added test cases to compare the number of rows returned from Kudu
scan when appling different types of runtime filter on same queries.
Updated bloom-filter-benchmark due to the bloom-filter implementation
change.

Bump Kudu version to d652cab17.

Testing:
 - Passed all exhaustive tests.

Performance benchmark:
 - Ran single_node_perf_run.py on TPC-H with scale as 30 for parquet
   and Kudu. Verified that new hash function and bloom-filter
   implementation don't cause regressions for HDFS bloom filters.
   For Kudu, there is one regression for query TPCH-Q9 and there
   are improvement for about 8 queris when appling both bloom and
   min-max filters. The bloom filter reduce the number of rows
   returned from Kudu scan, hence reduce the cost for aggregation
   and hash join. But bloom filter evaluation add extra cost for
   Kudu scan, which offset the gain on aggregation and join.
   Kudu scan need to be optimized for bloom filter in following
   tasks.
 - Ran bloom-filter microbenchmarks and verified that there is no
   regression for Insert/Find/Union functions with or without AVX2
   due to bloom-filter implementation changes. There is small
   performance degradation for Init function, but this function is
   not in hot path.

Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
---
M be/CMakeLists.txt
M be/src/benchmarks/bloom-filter-benchmark.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/filter-context.cc
M be/src/exec/kudu-scanner.cc
M be/src/runtime/raw-value-ir.cc
M be/src/runtime/raw-value.h
M be/src/runtime/raw-value.inline.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter-test.cc
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
M be/src/util/debug-util.cc
M be/src/util/debug-util.h
M bin/impala-config.sh
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
A testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test
A 
testdata/workloads/functional-query/queries/QueryTest/diff_runtime_filter_types.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test
M tests/query_test/test_runtime_filters.py
M tests/query_test/test_spilling.py
36 files changed, 2,024 insertions(+), 661 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/15683/26
--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 26
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-06-04 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..


Patch Set 24:

Could you do a pass over this with clang-format-diff? It's ready to be +2ed 
except for a few formatting issues.


--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 24
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 04 Jun 2020 17:37:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-06-03 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..


Patch Set 24:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6202/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 24
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Wed, 03 Jun 2020 14:24:49 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-06-03 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..


Patch Set 23:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6201/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 23
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Wed, 03 Jun 2020 14:06:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-06-03 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has uploaded a new patch set (#24). ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..

IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Defined the BloomFilter class as the wrapper of kudu::BlockBloomFilter.
impala::BloomFilter build runtime bloom filter in kudu::BlockBloomFilter
APIs with FastHash as default hash algorithm.
Removed the duplicated functions from impala::BloomFillter class.
Pushed down bloom filter to Kudu through Kudu clinet API.

Added a new query option ENABLED_RUNTIME_FILTER_TYPES to set enabled
runtime filter types, which only affect Kudu scan node now. By default,
bloom filter is not enabled, only min-max filter will be enabled for
Kudu. With this option, user could enable bloom filter, min-max filter,
or both bloom and min-max runtime filters.

Added new test cases in PlannerTest and end-end runtime_filters test
for pushing down bloom filter to Kudu.
Added test cases to compare the number of rows returned from Kudu
scan when appling different types of runtime filter on same queries.
Updated bloom-filter-benchmark due to the bloom-filter implementation
change.

Bump Kudu version to d652cab17.

Testing:
 - Passed all exhaustive tests.

Performance benchmark:
 - Ran single_node_perf_run.py on TPC-H with scale as 30 for parquet
   and Kudu. Verified that new hash function and bloom-filter
   implementation don't cause regressions for HDFS bloom filters.
   For Kudu, there is one regression for query TPCH-Q9 and there
   are improvement for about 8 queris when appling both bloom and
   min-max filters. The bloom filter reduce the number of rows
   returned from Kudu scan, hence reduce the cost for aggregation
   and hash join. But bloom filter evaluation add extra cost for
   Kudu scan, which offset the gain on aggregation and join.
   Kudu scan need to be optimized for bloom filter in following
   tasks.
 - Ran bloom-filter microbenchmarks and verified that there is no
   regression for Insert/Find/Union functions with or without AVX2
   due to bloom-filter implementation changes. There is small
   performance degration for Init function, but this function is not
   in hot path.

Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
---
M be/CMakeLists.txt
M be/src/benchmarks/bloom-filter-benchmark.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/filter-context.cc
M be/src/exec/kudu-scanner.cc
M be/src/runtime/raw-value-ir.cc
M be/src/runtime/raw-value.h
M be/src/runtime/raw-value.inline.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter-test.cc
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
M be/src/util/debug-util.cc
M be/src/util/debug-util.h
M bin/impala-config.sh
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
A testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test
A 
testdata/workloads/functional-query/queries/QueryTest/diff_runtime_filter_types.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test
M tests/query_test/test_runtime_filters.py
M tests/query_test/test_spilling.py
36 files changed, 2,017 insertions(+), 651 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/15683/24
--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 24
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-06-03 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has uploaded a new patch set (#23). ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..

IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Defined the BloomFilter class as the wrapper of kudu::BlockBloomFilter.
impala::BloomFilter build runtime bloom filter in kudu::BlockBloomFilter
APIs with FastHash as default hash algorithm.
Removed the duplicated functions from impala::BloomFillter class.
Pushed down bloom filter to Kudu through Kudu clinet API.

Added a new query option ENABLED_RUNTIME_FILTER_TYPES to set enabled
runtime filter types, which only affect Kudu scan node now. By default,
bloom filter is not enabled, only min-max filter will be enabled for
Kudu. With this option, user could enable bloom filter, min-max filter,
or both bloom and min-max runtime filters.

Added new test cases in PlannerTest and end-end runtime_filters test
for pushing down bloom filter to Kudu.
Added test cases to compare the number of rows returned from Kudu
scan when appling different types of runtime filter on same queries.
Updated bloom-filter-benchmark due to the bloom-filter implementation
change.

Bump Kudu version to d652cab17.

Testing:
 - Passed all exhaustive tests.

Performance benchmark:
 - Ran single_node_perf_run.py on TPC-H with scale as 30 for parquet
   and Kudu. Verified that new hash function and bloom-filter
   implementation don't cause regressions for HDFS bloom filters.
   For Kudu, there is one regression for query TPCH-Q9 and there
   are improvement for about 8 queris when appling both bloom and
   min-max filters. The bloom filter reduce the number of rows
   returned from Kudu scan, hence reduce the cost for aggregation
   and hash join. But bloom filter evaluation add extra cost for
   Kudu scan, which offset the gain on aggregation and join.
   Kudu scan need to be optimized for bloom filter in following
   tasks.
 - Ran bloom-filter microbenchmarks and verified that there is no
   regression for Insert/Find/Union functions with or without AVX2
   due to bloom-filter implementation changes. There is small
   performance degration for Init function, but this function is not
   in hot path.

Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
---
M be/CMakeLists.txt
M be/src/benchmarks/bloom-filter-benchmark.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/filter-context.cc
M be/src/exec/kudu-scanner.cc
M be/src/runtime/raw-value-ir.cc
M be/src/runtime/raw-value.h
M be/src/runtime/raw-value.inline.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter-test.cc
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
M be/src/util/debug-util.cc
M be/src/util/debug-util.h
M bin/impala-config.sh
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
A testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test
A 
testdata/workloads/functional-query/queries/QueryTest/diff_runtime_filter_types.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test
M tests/query_test/test_runtime_filters.py
M tests/query_test/test_spilling.py
36 files changed, 2,017 insertions(+), 651 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/15683/23
--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 23
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-06-03 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..


Patch Set 22:

(23 comments)

http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/codegen/gen_ir_descriptions.py
File be/src/codegen/gen_ir_descriptions.py:

http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/codegen/gen_ir_descriptions.py@204
PS22, Line 204: ["RAW_VALUE_GET_HASH_VALUE",
> I don't think this is being used anywhere anymore, so it can be removed.
Right, will remove this entry.


http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/exec/kudu-scanner.cc
File be/src/exec/kudu-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/exec/kudu-scanner.cc@247
PS22, Line 247: NULL
> nit: we prefer nullptr, here and elsewhere
Will fix it.


http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/kudu/util/block_bloom_filter.h
File be/src/kudu/util/block_bloom_filter.h:

http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/kudu/util/block_bloom_filter.h@146
PS22, Line 146: // It's possible that log_heap_size is less than 
kLogBucketWordBits in unit test.
> We definitely want to avoid making any changes to the be/src/kudu directory
Will change bloom-filter-test.cc to avoid hitting this error.


http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/runtime/raw-value.h
File be/src/runtime/raw-value.h:

http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/runtime/raw-value.h@86
PS22, Line 86: GetHashValueFastHash32
> I think you mean 'GetHashValueFastHash', i.e. this function doesn't return
Right, fix it.


http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/runtime/raw-value.h@93
PS22, Line 93: GetHashValueFastHash32NonNull
> Same here - this doesn't return a 32-bit value, so I think you meant to say
will fix it.


http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter-test.cc
File be/src/util/bloom-filter-test.cc:

http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter-test.cc@34
PS22, Line 34: // kudu::BlockBloomFilter::kCpu is a static variable and is 
initialized once.
 : // To temporarily disable AVX2 for Bloom Filter in runtime 
testing, set flag
 : // disable_blockbloomfilter_avx2 as true. See 
kudu::BlockBloomFilter::has_avx2().
 : // kudu::BlockBloomFilter::has_avx2() is called by 
kudu::BlockBloomFilter
 : // constructor and 
kudu::BlockBloomFilter::OrEqualArrayInternal(). Impala
 : // testing code should set this flag before creating 
impala::BloomFilter object
 : // which affect impala::BloomFilter::Insert() and 
impala::BloomFilter::Find(),
 : // or before calling impala::BloomFilter::or().
 : // This flag has no effect if the target CPU doesn't support 
AVX2.
> This comment is a lot longer than necessary. Probably fine to just say some
will make it short.


http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.h
File be/src/util/bloom-filter.h:

http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.h@40
PS22, Line 40: class BlockBloomFilter;
 : class BlockBloomFilterBufferAllocatorIf;
 : class Slice;
> nit: I think currently as written none of these are needed, since Slice isn
will remove these three lines.


http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.h@70
PS22, Line 70: class ImpalaBloomFilterBufferAllocator : public 
kudu::BlockBloomFilterBufferAllocatorIf {
> I think this is now only referenced in bloom-filter.cc, so we can just move
A buffer allocator is defined as regular member variable in BloomFilter class. 
If moving this class definition to bloom-filter.cc, we will get a compilation 
error which complains  incomplete type for the allocator member variable.


http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.h@210
PS22, Line 210:  *
> nit: formatting
will fix it


http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.h@213
PS22, Line 213: /// Buffer allocator is owned by the bloom filter.
> This comment seems weird. Maybe say something like "used by kudu to allocat
will fix it.


http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.h@215
PS22, Line 215: e
> nit: 'E'
Will fix it.


http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.cc
File be/src/util/bloom-filter.cc:

http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.cc@35
PS22, Line 35: #include "util/kudu-status-util.h"
> nit: please alphabetize
Will fix it


http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.cc@49
PS22, Line 49: delete
> In general, we try to avoid using 'delete' as its error prone.
will change it as regular member variable.


http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.cc@52
PS22, Line 52: // Kudu only support FastHash for 

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-06-02 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..


Patch Set 22:

(23 comments)

The patch is looking pretty good, thanks for all the work on this!

Mostly just some nits and additional testing left.

http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/codegen/gen_ir_descriptions.py
File be/src/codegen/gen_ir_descriptions.py:

http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/codegen/gen_ir_descriptions.py@204
PS22, Line 204: ["RAW_VALUE_GET_HASH_VALUE",
I don't think this is being used anywhere anymore, so it can be removed.


http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/exec/kudu-scanner.cc
File be/src/exec/kudu-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/exec/kudu-scanner.cc@247
PS22, Line 247: NULL
nit: we prefer nullptr, here and elsewhere


http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/kudu/util/block_bloom_filter.h
File be/src/kudu/util/block_bloom_filter.h:

http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/kudu/util/block_bloom_filter.h@146
PS22, Line 146: // It's possible that log_heap_size is less than 
kLogBucketWordBits in unit test.
We definitely want to avoid making any changes to the be/src/kudu directory if 
possible, since it makes upgrading the newer versions trickier. Is it possible 
to rewrite the tests to not trigger this?


http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/runtime/raw-value.h
File be/src/runtime/raw-value.h:

http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/runtime/raw-value.h@86
PS22, Line 86: GetHashValueFastHash32
I think you mean 'GetHashValueFastHash', i.e. this function doesn't return a 
32-bit value.


http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/runtime/raw-value.h@93
PS22, Line 93: GetHashValueFastHash32NonNull
Same here - this doesn't return a 32-bit value, so I think you meant to say 
GetHashValueFastHashNonNull


http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter-test.cc
File be/src/util/bloom-filter-test.cc:

http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter-test.cc@34
PS22, Line 34: // kudu::BlockBloomFilter::kCpu is a static variable and is 
initialized once.
 : // To temporarily disable AVX2 for Bloom Filter in runtime 
testing, set flag
 : // disable_blockbloomfilter_avx2 as true. See 
kudu::BlockBloomFilter::has_avx2().
 : // kudu::BlockBloomFilter::has_avx2() is called by 
kudu::BlockBloomFilter
 : // constructor and 
kudu::BlockBloomFilter::OrEqualArrayInternal(). Impala
 : // testing code should set this flag before creating 
impala::BloomFilter object
 : // which affect impala::BloomFilter::Insert() and 
impala::BloomFilter::Find(),
 : // or before calling impala::BloomFilter::or().
 : // This flag has no effect if the target CPU doesn't support 
AVX2.
This comment is a lot longer than necessary. Probably fine to just say 
something like "This flag is used in Kudu to temporarily disable avx2 support 
for testing purposes"


http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.h
File be/src/util/bloom-filter.h:

http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.h@40
PS22, Line 40: class BlockBloomFilter;
 : class BlockBloomFilterBufferAllocatorIf;
 : class Slice;
nit: I think currently as written none of these are needed, since Slice isn't 
used here and you're importing block_bloom_filter anyways.


http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.h@70
PS22, Line 70: class ImpalaBloomFilterBufferAllocator : public 
kudu::BlockBloomFilterBufferAllocatorIf {
I think this is now only referenced in bloom-filter.cc, so we can just move the 
definition there.


http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.h@210
PS22, Line 210:  *
nit: formatting


http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.h@213
PS22, Line 213: /// Buffer allocator is owned by the bloom filter.
This comment seems weird. Maybe say something like "used by kudu to allocate 
memory for 'block_bloom_filter_'


http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.h@215
PS22, Line 215: e
nit: 'E'

and please leave an extra line between this and the definition above


http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.cc
File be/src/util/bloom-filter.cc:

http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.cc@35
PS22, Line 35: #include "util/kudu-status-util.h"
nit: please alphabetize


http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.cc@49
PS22, Line 49: delete
In general, we try to avoid using 'delete' as its error prone.

I'm not sure if there's a reason why it needs to be a pointer and not just a be 
a 

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-06-01 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..


Patch Set 22:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6186/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 22
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 01 Jun 2020 22:21:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-06-01 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has uploaded a new patch set (#22). ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..

IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Defined the BloomFilter class as the wrapper of kudu::BlockBloomFilter.
impala::BloomFilter build runtime bloom filter in kudu::BlockBloomFilter
APIs with FastHash as default hash algorithm.
Removed the duplicated functions from impala::BloomFillter class.
Pushed down bloom filter to Kudu through Kudu clinet API.

Added a new query option ENABLED_RUNTIME_FILTER_TYPES to set enabled
runtime filter types, which only affect Kudu scan node now. By default,
bloom filter is not enabled, only min-max filter will be enabled for
Kudu. With this option, user could enable bloom filter, min-max filter,
or both bloom and min-max runtime filters.

Added new test cases in PlannerTest and end-end runtime_filters test
for pushing down bloom filter to Kudu.
Added test cases to compare the number of rows returned from Kudu
scan when appling different types of runtime filter on same queries.
Updated bloom-filter-benchmark due to the bloom-filter implementation
change.

Bump Kudu version to d652cab17.

Testing:
 - Passed all exhaustive tests.

Performance benchmark:
 - Ran single_node_perf_run.py on TPC-H with scale as 30 for parquet
   and Kudu. Verified that new hash function and bloom-filter
   implementation don't cause regressions for HDFS bloom filters.
   For Kudu, there is one regression for query TPCH-Q9 and there
   are improvement for about 8 queris when appling both bloom and
   min-max filters. The bloom filter reduce the number of rows
   returned from Kudu scan, hence reduce the cost for aggregation
   and hash join. But bloom filter evaluation add extra cost for
   Kudu scan, which offset the gain on aggregation and join.
   Kudu scan need to be optimized for bloom filter in following
   tasks.
 - Ran bloom-filter microbenchmarks and verified that there is no
   regression for Insert/Find/Union functions with or without AVX2
   due to bloom-filter implementation changes. There is small
   performance degration for Init function, but this function is not
   in hot path.

Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
---
M be/CMakeLists.txt
M be/src/benchmarks/bloom-filter-benchmark.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/filter-context.cc
M be/src/exec/kudu-scanner.cc
M be/src/kudu/util/block_bloom_filter.h
M be/src/runtime/raw-value-ir.cc
M be/src/runtime/raw-value.h
M be/src/runtime/raw-value.inline.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter-test.cc
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
M be/src/util/debug-util.cc
M be/src/util/debug-util.h
M be/src/util/hash-util.h
M bin/impala-config.sh
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
A testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test
A 
testdata/workloads/functional-query/queries/QueryTest/diff_runtime_filter_types.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test
M tests/query_test/test_runtime_filters.py
M tests/query_test/test_spilling.py
37 files changed, 1,524 insertions(+), 622 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/15683/22
--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 22
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-06-01 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..


Patch Set 21:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/15683/21/be/src/util/bloom-filter.h
File be/src/util/bloom-filter.h:

http://gerrit.cloudera.org:8080/#/c/15683/21/be/src/util/bloom-filter.h@73
PS21, Line 73: use
> uses
fixed


http://gerrit.cloudera.org:8080/#/c/15683/21/be/src/util/bloom-filter.cc
File be/src/util/bloom-filter.cc:

http://gerrit.cloudera.org:8080/#/c/15683/21/be/src/util/bloom-filter.cc@202
PS21, Line 202:   BufferPool *buffer_pool = 
ExecEnv::GetInstance()->buffer_pool();
  :   DCHECK(buffer_pool_client_ != nullptr);
> With this implementation, a check that is_allocated is not true is needed t
will call close() first.



--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 21
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 01 Jun 2020 21:24:59 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-06-01 Thread Bankim Bhavsar (Code Review)
Bankim Bhavsar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..


Patch Set 21:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/15683/21/be/src/util/bloom-filter.h
File be/src/util/bloom-filter.h:

http://gerrit.cloudera.org:8080/#/c/15683/21/be/src/util/bloom-filter.h@73
PS21, Line 73: use
uses


http://gerrit.cloudera.org:8080/#/c/15683/21/be/src/util/bloom-filter.cc
File be/src/util/bloom-filter.cc:

http://gerrit.cloudera.org:8080/#/c/15683/21/be/src/util/bloom-filter.cc@202
PS21, Line 202:   BufferPool *buffer_pool = 
ExecEnv::GetInstance()->buffer_pool();
  :   DCHECK(buffer_pool_client_ != nullptr);
With this implementation, a check that is_allocated is not true is needed to 
make sure previous allocation is not leaked?



--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 21
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 01 Jun 2020 19:19:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-05-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..


Patch Set 21:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6166/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 21
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 29 May 2020 18:25:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-05-29 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has uploaded a new patch set (#21). ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..

IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Defined the BloomFilter class as the wrapper of kudu::BlockBloomFilter.
impala::BloomFilter build runtime bloom filter in kudu::BlockBloomFilter
APIs with FastHash as default hash algorithm.
Removed the duplicated functions from impala::BloomFillter class.
Pushed down bloom filter to Kudu through Kudu clinet API.

Added a new query option ENABLED_RUNTIME_FILTER_TYPES to set enabled
runtime filter types, which only affect Kudu scan node now. By default,
bloom filter is not enabled, only min-max filter will be enabled for
Kudu. With this option, user could enable bloom filter, min-max filter,
or both bloom and min-max runtime filters.

Added new test cases in PlannerTest and end-end runtime_filters test
for pushing down bloom filter to Kudu.
Added test cases to compare the number of rows returned from Kudu
scan when appling different types of runtime filter on same queries.
Updated bloom-filter-benchmark due to the bloom-filter implementation
change.

Bump Kudu version to d652cab17.

Testing:
 - Passed all core tests.

Performance benchmark:
 - Ran single_node_perf_run.py on TPC-H with scale as 30 for parquet
   and Kudu. Verified that new hash function and bloom-filter
   implementation don't cause regressions for HDFS bloom filters.
   For Kudu, there is one regression for query TPCH-Q9 and there
   are improvement for about 8 queris when appling both bloom and
   min-max filters. The bloom filter reduce the number of rows
   returned from Kudu scan, hence reduce the cost for aggregation
   and hash join. But bloom filter evaluation add extra cost for
   Kudu scan, which offset the gain on aggregation and join.
   Kudu scan need to be optimized for bloom filter in following
   tasks.
 - Ran bloom-filter microbenchmarks and verified that there is no
   regression for Insert/Find/Union functions with or without AVX2
   due to bloom-filter implementation changes. There is small
   performance degration for Init function, but this function is not
   in hot path.

Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
---
M be/CMakeLists.txt
M be/src/benchmarks/bloom-filter-benchmark.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/filter-context.cc
M be/src/exec/kudu-scanner.cc
M be/src/kudu/util/block_bloom_filter.h
M be/src/runtime/raw-value-ir.cc
M be/src/runtime/raw-value.h
M be/src/runtime/raw-value.inline.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter-test.cc
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
M be/src/util/debug-util.cc
M be/src/util/debug-util.h
M be/src/util/hash-util.h
M bin/impala-config.sh
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
A testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test
A 
testdata/workloads/functional-query/queries/QueryTest/diff_runtime_filter_types.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test
M tests/query_test/test_runtime_filters.py
M tests/query_test/test_spilling.py
37 files changed, 1,522 insertions(+), 622 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/15683/21
--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 21
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-05-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..


Patch Set 20:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6133/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 20
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Sat, 23 May 2020 22:54:42 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-05-23 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has uploaded a new patch set (#20). ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..

IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Defined the BloomFilter class as the wrapper of kudu::BlockBloomFilter.
impala::BloomFilter build runtime bloom filter in kudu::BlockBloomFilter
APIs with FastHash as default hash algorithm.
Removed the duplicated functions from impala::BloomFillter class.
Pushed down bloom filter to Kudu through Kudu clinet API.

Added a new query option ENABLED_RUNTIME_FILTER_TYPES to set enabled
runtime filter types, which only affect Kudu scan node now. By default,
bloom filter is not enabled, only min-max filter will be enabled for
Kudu. With this option, user could enable bloom filter, min-max filter,
or both bloom and min-max runtime filters.

Added new test cases in PlannerTest and end-end runtime_filters test
for pushing down bloom filter to Kudu.
Added test cases to compare the number of rows returned from Kudu
scan when appling different types of runtime filter on same queries.
Updated bloom-filter-benchmark due to the bloom-filter implementation
change.

Bump Kudu version to d652cab17.

Testing:
 - Passed all core tests.

Performance benchmark:
 - Ran single_node_perf_run.py on TPC-H with scale as 30 for parquet
   and Kudu. Verified that new hash function and bloom-filter
   implementation don't cause regressions for HDFS bloom filters.
   For Kudu, there is one regression for query TPCH-Q9 and there
   are improvement for about 8 queris when appling both bloom and
   min-max filters. The bloom filter reduce the number of rows
   returned from Kudu scan, hence reduce the cost for aggregation
   and hash join. But bloom filter evaluation add extra cost for
   Kudu scan, which offset the gain on aggregation and join.
   Kudu scan need to be optimized for bloom filter in following
   tasks.
 - Ran bloom-filter microbenchmarks and verified that there is no
   regression for Insert/Find/Union functions with or without AVX2
   due to bloom-filter implementation changes. There is small
   performance degration for Init function, but this function is not
   in hot path.

Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
---
M be/CMakeLists.txt
M be/src/benchmarks/bloom-filter-benchmark.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/filter-context.cc
M be/src/exec/kudu-scanner.cc
M be/src/kudu/util/block_bloom_filter.h
M be/src/runtime/raw-value-ir.cc
M be/src/runtime/raw-value.h
M be/src/runtime/raw-value.inline.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter-test.cc
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
M be/src/util/debug-util.cc
M be/src/util/debug-util.h
M be/src/util/hash-util.h
M bin/impala-config.sh
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
A testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test
A 
testdata/workloads/functional-query/queries/QueryTest/diff_runtime_filter_types.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test
M tests/query_test/test_runtime_filters.py
M tests/query_test/test_spilling.py
37 files changed, 1,523 insertions(+), 623 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/15683/20
--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 20
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-05-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..


Patch Set 18:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6121/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 18
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 21 May 2020 01:21:07 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-05-20 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has uploaded a new patch set (#18). ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..

IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Defined the BloomFilter class as the wrapper of kudu::BlockBloomFilter.
impala::BloomFilter build runtime bloom filter in kudu::BlockBloomFilter
APIs with FastHash as default hash algorithm.
Removed the duplicated functions from impala::BloomFillter class.
Pushed down bloom filter to Kudu through Kudu clinet API.

Added a new query option ENABLED_RUNTIME_FILTER_TYPES to set enabled
runtime filter types, which only affect Kudu scan node now. By default,
bloom filter is not enabled, only min-max filter will be enabled for
Kudu. With this option, user could enable bloom filter, min-max filter,
or both bloom and min-max runtime filters.

Added new test cases in PlannerTest and end-end runtime_filters test
for pushing down bloom filter to Kudu.
Added test cases to compare the number of rows returned from Kudu
scan when appling different types of runtime filter on same queries.
Updated bloom-filter-benchmark due to the bloom-filter implementation
change.

Testing:
 - Passed all core tests.

Performance benchmark:
 - Ran single_node_perf_run.py on TPC-H with scale as 30 for parquet
   and Kudu. Verified that new hash function and bloom-filter
   implementation don't cause regressions for HDFS bloom filters.
   For Kudu, there is one regression for query TPCH-Q9 and there
   are improvement for about 8 queris when appling both bloom and
   min-max filters. The bloom filter reduce the number of rows
   returned from Kudu scan, hence reduce the cost for aggregation
   and hash join. But bloom filter evaluation add extra cost for
   Kudu scan, which offset the gain on aggregation and join.
   Kudu scan need to be optimized for bloom filter in following
   tasks.
 - Ran bloom-filter microbenchmarks and verified that there is no
   regression for Insert/Find/Union functions with or without AVX2
   due to bloom-filter implementation changes. There is small
   performance degration for Init function, but this function is not
   in hot path.

Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
---
M be/CMakeLists.txt
M be/src/benchmarks/bloom-filter-benchmark.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/filter-context.cc
M be/src/exec/kudu-scanner.cc
M be/src/kudu/util/block_bloom_filter.h
M be/src/runtime/raw-value-ir.cc
M be/src/runtime/raw-value.h
M be/src/runtime/raw-value.inline.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-bank.h
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter-test.cc
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
M be/src/util/debug-util.cc
M be/src/util/debug-util.h
M be/src/util/hash-util.h
M bin/impala-config.sh
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
A testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test
A 
testdata/workloads/functional-query/queries/QueryTest/diff_runtime_filter_types.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test
M tests/query_test/test_runtime_filters.py
M tests/query_test/test_spilling.py
38 files changed, 1,533 insertions(+), 622 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/15683/18
--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 18
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-05-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..


Patch Set 17:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/6116/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 17
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Wed, 20 May 2020 18:43:44 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-05-20 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has uploaded a new patch set (#17). ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..

IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Defined the BloomFilter class as the wrapper of kudu::BlockBloomFilter.
impala::BloomFilter build runtime bloom filter in kudu::BlockBloomFilter
APIs with FastHash as default hash algorithm.
Removed the duplicated functions from impala::BloomFillter class.
Pushed down bloom filter to Kudu through Kudu clinet API.
Added a new query option to set enabled runtime filter types, which
only affect Kudu scan node now. By default, both bloom filter and
min-max filter will be enabled for Kudu.
Added new test cases in PlannerTest and end-end runtime_filters test
for pushing down bloom filter to Kudu.
Updated bloom-filter-benchmark for the bloom-filter implementation
change.

Testing:
 - Passed test_kudu.py
 - Passed end-end test_runtime_filters.py.
 - Passed frontend Planner tests.
 - Ran single_node_perf_run.py on TPC-H with scale as 30 for parquet
   and Kudu. Verified that new hash function and bloom-filter
   implementation don't cause regressions for HDFS bloom filters.
 - Ran bloom-filter-benchmark and verified there is no regression
   due to bloom-filter implementation changes.

Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
---
M be/CMakeLists.txt
M be/src/benchmarks/bloom-filter-benchmark.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/filter-context.cc
M be/src/exec/kudu-scanner.cc
M be/src/kudu/util/block_bloom_filter.h
M be/src/runtime/raw-value-ir.cc
M be/src/runtime/raw-value.h
M be/src/runtime/raw-value.inline.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-bank.h
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter-test.cc
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
M be/src/util/debug-util.cc
M be/src/util/debug-util.h
M be/src/util/hash-util.h
M bin/impala-config.sh
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
A testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test
M tests/query_test/test_runtime_filters.py
M tests/query_test/test_spilling.py
37 files changed, 1,380 insertions(+), 622 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/15683/17
--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 17
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-05-14 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..


Patch Set 14:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15683/12/be/src/benchmarks/bloom-filter-benchmark.cc
File be/src/benchmarks/bloom-filter-benchmark.cc:

http://gerrit.cloudera.org:8080/#/c/15683/12/be/src/benchmarks/bloom-filter-benchmark.cc@64
PS12, Line 64: // Machine Info: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz
> Reran the the bloom filter benchmark on my desktop before and after applyin
Sounds good; thanks!



--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 14
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 14 May 2020 14:02:40 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-05-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..


Patch Set 14:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/6058/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 14
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 14 May 2020 07:10:43 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-05-14 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has uploaded a new patch set (#14). ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..

IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Defined the BloomFilter class as the wrapper of kudu::BlockBloomFilter.
impala::BloomFilter build runtime bloom filter in kudu::BlockBloomFilter
APIs with FastHash as default hash algorithm.
Removed the duplicated functions from impala::BloomFillter class.
Pushed down bloom filter to Kudu through Kudu clinet API.
Added a new query option to set enabled runtime filter types, which
only affect Kudu scan node now. By default, both bloom filter and
min-max filter will be enabled for Kudu.
Added new test cases in PlannerTest and end-end runtime_filters test
for pushing down bloom filter to Kudu.
Updated bloom-filter-benchmark for the bloom-filter implementation
change.

Testing:
 - Passed test_kudu.py
 - Passed end-end test_runtime_filters.py.
 - Passed frontend Planner tests.
 - Ran single_node_perf_run.py on TPC-H with scale as 30 for parquet
   and Kudu. Verified that new hash function and bloom-filter
   implementation don't cause regressions for HDFS bloom filters.
 - Ran bloom-filter-benchmark and verified there is no regression
   due to bloom-filter implementation changes.

Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
---
M be/CMakeLists.txt
M be/src/benchmarks/bloom-filter-benchmark.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/filter-context.cc
M be/src/exec/kudu-scanner.cc
M be/src/runtime/raw-value-ir.cc
M be/src/runtime/raw-value.h
M be/src/runtime/raw-value.inline.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-bank.h
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter-test.cc
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
M be/src/util/debug-util.cc
M be/src/util/debug-util.h
M be/src/util/hash-util.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
A testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test
M tests/query_test/test_runtime_filters.py
34 files changed, 1,420 insertions(+), 633 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/15683/14
--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 14
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-05-11 Thread Bankim Bhavsar (Code Review)
Bankim Bhavsar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..


Patch Set 12:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15683/12/be/src/exec/kudu-scanner.cc
File be/src/exec/kudu-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/15683/12/be/src/exec/kudu-scanner.cc@252
PS12, Line 252: ImpalaBloomFilterBufferAllocator* allocator =
  : new ImpalaBloomFilterBufferAllocator(
  : state_->filter_bank()->buffer_pool_client());
This doesn't look right.
The allocator that is used to create BBF must be passed down to kudu.
In current impl, the one stored in impala BF must be passed down. 
https://gerrit.cloudera.org/c/15683/12/be/src/util/bloom-filter.h#214



--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 12
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 11 May 2020 23:15:39 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-05-10 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..


Patch Set 12:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15683/12/be/src/benchmarks/bloom-filter-benchmark.cc
File be/src/benchmarks/bloom-filter-benchmark.cc:

http://gerrit.cloudera.org:8080/#/c/15683/12/be/src/benchmarks/bloom-filter-benchmark.cc@64
PS12, Line 64: // Machine Info: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz
I'd suggest re-running these on your machine before and after applying this 
patch, to see if there are any regressions. There are an indirection or two 
that hopefully will be inlined away, but I'm not sure.



--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 12
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 11 May 2020 03:19:34 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-05-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..


Patch Set 12:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/6016/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 12
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 11 May 2020 01:48:25 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-05-10 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has uploaded a new patch set (#12). ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..

IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Defined the BloomFilter class as the wrapper of kudu::BlockBloomFilter.
impala::BloomFilter build runtime bloom filter in kudu::BlockBloomFilter
APIs with FastHash as default hash algorithm.
Removed the duplicated functions from impala::BloomFillter class.
Pushed down bloom filter to Kudu through Kudu clinet API.
Added a new query option to set enabled runtime filter types, which
only affect Kudu scan node now. By default, both bloom filter and
min-max filter will be enabled for Kudu.
Added new test cases in PlannerTest and end-end runtime_filters test
for pushing down bloom filter to Kudu.
Updated bloom-filter-benchmark for the bloom-filter implementation
change.

Testing:
 - Passed test_kudu.py
 - Passed end-end test_runtime_filters.py.
 - Passed frontend Planner tests.
 - Ran single_node_perf_run.py on TPC-H with scale as 30 for parquet
   and Kudu. Verified that new hash function and bloom-filter
   implementation don't cause regressions for HDFS bloom filters.
 - Ran bloom-filter-benchmark and verified there is no regression
   due to bloom-filter implementation changes.

Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
---
M be/CMakeLists.txt
M be/src/benchmarks/bloom-filter-benchmark.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/filter-context.cc
M be/src/exec/kudu-scanner.cc
M be/src/runtime/raw-value-ir.cc
M be/src/runtime/raw-value.h
M be/src/runtime/raw-value.inline.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-bank.h
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter-test.cc
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
M be/src/util/debug-util.cc
M be/src/util/debug-util.h
M be/src/util/hash-util.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
A testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test
M tests/query_test/test_runtime_filters.py
34 files changed, 1,339 insertions(+), 619 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/15683/12
--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 12
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-05-09 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..


Patch Set 11:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/15683/11/be/src/util/bloom-filter.cc
File be/src/util/bloom-filter.cc:

http://gerrit.cloudera.org:8080/#/c/15683/11/be/src/util/bloom-filter.cc@71
PS11, Line 71: if (directory_in_size == 0)
> Could you clarify with a comment in what case would directory_in_size be 0
wdirectory_size equal 0 only when the fileter is always false. In the case the 
bloom filter will not be pushed to Kudu. Will add a comment here.


http://gerrit.cloudera.org:8080/#/c/15683/11/be/src/util/bloom-filter.cc@201
PS11, Line 201:   if (is_allocated_) {
  : LOG(DFATAL) << "Each call to AllocateBuffer must have a 
corresponding call "
  : << "to FreeBuffer.";
  : Close(); // Ensure that any previously allocated memory is 
released.
  :   }
> Same instance of allocator can be used for multiple BBFs. So this doesn't l
Line 201 to 205 are not useful. Will remove them.



--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 11
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Sat, 09 May 2020 17:03:48 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-05-07 Thread Bankim Bhavsar (Code Review)
Bankim Bhavsar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..


Patch Set 11:

(2 comments)

Apologize for not taking a look earlier. Lot of the changes looked specific to 
Impala.
I mainly looked at Impala BBF code that was updated to invoke one from 
kudu-util.

http://gerrit.cloudera.org:8080/#/c/15683/11/be/src/util/bloom-filter.cc
File be/src/util/bloom-filter.cc:

http://gerrit.cloudera.org:8080/#/c/15683/11/be/src/util/bloom-filter.cc@71
PS11, Line 71: if (directory_in_size == 0)
Could you clarify with a comment in what case would directory_in_size be 0 
since this Init variant should really be used for deserialization case.


http://gerrit.cloudera.org:8080/#/c/15683/11/be/src/util/bloom-filter.cc@201
PS11, Line 201:   if (is_allocated_) {
  : LOG(DFATAL) << "Each call to AllocateBuffer must have a 
corresponding call "
  : << "to FreeBuffer.";
  : Close(); // Ensure that any previously allocated memory is 
released.
  :   }
Same instance of allocator can be used for multiple BBFs. So this doesn't look 
right to me.

With that in mind, I don't see need for is_allocated_member variable. 
ImpalaBloomFilterBufferAllocator should be a simple wrapper on top of Impala's 
default buffer pool and should be implemented as a singleton.



--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 11
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 08 May 2020 00:11:08 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-05-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..


Patch Set 11:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/5995/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 11
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 07 May 2020 18:47:49 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-05-07 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has uploaded a new patch set (#11). ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..

IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Defined the BloomFilter class as the wrapper of kudu::BlockBloomFilter.
impala::BloomFilter build runtime bloom filter in kudu::BlockBloomFilter
APIs with FastHash as default hash algorithm.
Removed the duplicated functions from impala::BloomFillter class.
Pushed down bloom filter to Kudu through Kudu clinet API.
Added a new query option to set enabled runtime filter types, which
only affect Kudu scan node now. By default, both bloom filter and
min-max filter will be enabled for Kudu.
Added new test cases in PlannerTest and end-end runtime_filters test
for pushing down bloom filter to Kudu.
Updated bloom-filter-benchmark for the bloom-filter implementation
change.

Testing:
 - Passed test_kudu.py
 - Passed end-end test_runtime_filters.py.
 - Passed frontend Planner tests.
 - Ran single_node_perf_run.py on TPC-H with scale as 30 for parquet
   and Kudu. Verified that new hash function and bloom-filter
   implementation don't cause regressions for HDFS bloom filters.
 - Ran bloom-filter-benchmark and verified there is no regression
   due to bloom-filter implementation changes.

Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
---
M be/CMakeLists.txt
M be/src/benchmarks/bloom-filter-benchmark.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/filter-context.cc
M be/src/exec/kudu-scanner.cc
M be/src/runtime/raw-value-ir.cc
M be/src/runtime/raw-value.h
M be/src/runtime/raw-value.inline.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-bank.h
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter-test.cc
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
M be/src/util/debug-util.cc
M be/src/util/debug-util.h
M be/src/util/hash-util.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
A testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test
M tests/query_test/test_runtime_filters.py
34 files changed, 1,344 insertions(+), 620 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/15683/11
--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 11
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-05-07 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..


Patch Set 9:

(17 comments)

http://gerrit.cloudera.org:8080/#/c/15683/9/be/CMakeLists.txt
File be/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/15683/9/be/CMakeLists.txt@234
PS9, Line 234:   "-DKUDU_HEADERS_NO_STUBS"
> nit formatting: trailing whitespace and you can wrap this into the rest of
will fix as suggested.


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/filter-context.cc
File be/src/exec/filter-context.cc:

http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/filter-context.cc@91
PS9, Line 91: //local_bloom_filter->Insert(val, expr_eval->root().type());
> I guess this was left by accident?
Right, will remove these two lines.


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/filter-context.cc@106
PS9, Line 106: // An example of the generated code for TPCH-Q2: RF002 -> 
n_regionkey
> This needs to be updated (though of course it will only change slightly). I
will add LlvmCodeGen::Print(*fn) at the end of function to dump the codegen 
functions, then update the sample in the functions' comments.


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/kudu-scanner.cc
File be/src/exec/kudu-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/kudu-scanner.cc@232
PS9, Line 232: }
 :   else if
> nit: formatting ('else if' should go on the same line as prior '}')
Will fix it.


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/kudu-scanner.cc@234
PS9, Line 234: continue;
> Brief comment, eg. 'This filter won't actually remove any rows so we don't
will add comments


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/kudu-scanner.cc@236
PS9, Line 236: else
> nit: it doesn't actually change how the code works, but it would be better
will remove "else"


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/kudu-scanner.cc@238
PS9, Line 238: if (filter != nullptr) {
> I think we can save ourselves some duplication and an indention level by ch
Agree, will change the code.


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/kudu-scanner.cc@239
PS9, Line 239: auto it = ctx.filter->filter_desc().planid_to_target_ndx.find(
 :   scan_node_->id());
 :   const TRuntimeFilterTargetDesc _desc =
 :   ctx.filter->filter_desc().targets[it->second];
 :   const string _name = target_desc.kudu_col_name;
 :   DCHECK(col_name != "");
> This is duplicated below, I think we can eliminate the duplication by movin
That's right. Will remove the duplication.


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.h
File be/src/util/bloom-filter.h:

http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.h@53
PS9, Line 53: class BloomFilterBufferAllocator;
> I don't think this is necessary
will remove it.


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.h@73
PS9, Line 73: BloomFilterBufferAllocator
> nit: maybe name this ImpalaBloomFilterBufferAllocator to ensure there's no
Will rename it as suggested


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.h@91
PS9, Line 91:   std::shared_ptr 
Clone() const override {
> If this is something that's really only used in Kudu's internal testing and
Will add LOG(FATAL) and simplify the logic of allocate/close functions.


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.h@111
PS9, Line 111: /// A BloomFilter stores sets of items and offers a query 
operation indicating whether or
> This class comment could probably use some updating to reflect the fact tha
Will update the comments.


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.h@171
PS9, Line 171:   void Insert(void* val, const ColumnType& col_type) noexcept;
> Is this version of Insert() used anywhere? Same with the equivalent Find()
Will remove them


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.h@241
PS9, Line 241: HashAlgorithm hash_algorithm_;
> As far as I can tell, this isn't actually getting used anywhere, and I don'
Will remove it.


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.cc
File be/src/util/bloom-filter.cc:

http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.cc@205
PS9, Line 205:   if (is_allocated_) Close();
> I suspect its probably supposed to be guaranteed that FreeBuffer() is alway
Will add LOG(DFATAL).


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.cc@226
PS9, Line 226:   Close(); // Ensure that any previously allocated memory is 
released.
> I wonder if this is really necessary - I would hope that Kudu provides a gu
Will LOG(DFALTAL) if is allocated.



[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-05-04 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..


Patch Set 9:

(17 comments)

http://gerrit.cloudera.org:8080/#/c/15683/9/be/CMakeLists.txt
File be/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/15683/9/be/CMakeLists.txt@234
PS9, Line 234:   "-DKUDU_HEADERS_NO_STUBS"
nit formatting: trailing whitespace and you can wrap this into the rest of the 
string instead of putting it on its own line


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/filter-context.cc
File be/src/exec/filter-context.cc:

http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/filter-context.cc@91
PS9, Line 91: //local_bloom_filter->Insert(val, expr_eval->root().type());
I guess this was left by accident?


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/filter-context.cc@106
PS9, Line 106: // An example of the generated code for TPCH-Q2: RF002 -> 
n_regionkey
This needs to be updated (though of course it will only change slightly). If 
you're not sure how to get this, see: 
https://cwiki.apache.org/confluence/display/IMPALA/Codegen

(the easiest way is probably just to add a line at the end of this function 
that logs fn->dump() or whatever)


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/kudu-scanner.cc
File be/src/exec/kudu-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/kudu-scanner.cc@232
PS9, Line 232: }
 :   else if
nit: formatting ('else if' should go on the same line as prior '}')

Not sure if I've bugged you about clang-format before, but its a great way to 
help ensure that you're code reviews don't have these sorts of simple errors. 
See: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65868536

Note that clang-format is just a guide and if it's suggestion differs from 
surrounding code then you can feel free to ignore it


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/kudu-scanner.cc@234
PS9, Line 234: continue;
Brief comment, eg. 'This filter won't actually remove any rows so we don't need 
to push it down to Kudu'


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/kudu-scanner.cc@236
PS9, Line 236: else
nit: it doesn't actually change how the code works, but it would be better to 
leave off this 'else' since its not actually mutually exclusive with the prior 
'if's


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/kudu-scanner.cc@238
PS9, Line 238: if (filter != nullptr) {
I think we can save ourselves some duplication and an indention level by 
changing this to 'ctx.filter->HasFilter()' and moving it up to the 
'if(AlwaysTrue())'


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/kudu-scanner.cc@239
PS9, Line 239: auto it = ctx.filter->filter_desc().planid_to_target_ndx.find(
 :   scan_node_->id());
 :   const TRuntimeFilterTargetDesc _desc =
 :   ctx.filter->filter_desc().targets[it->second];
 :   const string _name = target_desc.kudu_col_name;
 :   DCHECK(col_name != "");
This is duplicated below, I think we can eliminate the duplication by moving 
this up to before the 'if(is_bloom)'


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.h
File be/src/util/bloom-filter.h:

http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.h@53
PS9, Line 53: class BloomFilterBufferAllocator;
I don't think this is necessary


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.h@73
PS9, Line 73: BloomFilterBufferAllocator
nit: maybe name this ImpalaBloomFilterBufferAllocator to ensure there's no 
confusion


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.h@91
PS9, Line 91:   std::shared_ptr 
Clone() const override {
If this is something that's really only used in Kudu's internal testing and 
won't ever get hit in Impala, it might be better to just do a LOG(FATAL) here 
or something, since then you would be able to simplify the logic of 
AllocateBuffer()/Close()/etc. and not support a code path that's never getting 
hit.

Probably have to check with Bankim to ensure that's really the case, though


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.h@111
PS9, Line 111: /// A BloomFilter stores sets of items and offers a query 
operation indicating whether or
This class comment could probably use some updating to reflect the fact that 
this class is now basically a thin wrapper around Kudu's bloom filter


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.h@171
PS9, Line 171:   void Insert(void* val, const ColumnType& col_type) noexcept;
Is this version of Insert() used anywhere? Same with the equivalent Find() below


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.h@241
PS9, 

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-05-03 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..


Patch Set 9:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/5939/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 9
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Sun, 03 May 2020 07:08:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-05-03 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has uploaded a new patch set (#9). ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..

IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Defined the BloomFilter class as the wrapper of kudu::BlockBloomFilter.
impala::BloomFilter build runtime bloom filter in kudu::BlockBloomFilter
APIs with FastHash as default hash algorithm.
Removed the duplicated functions from impala::BloomFillter class.
Pushed down bloom filter to Kudu through Kudu clinet API.
Added a new query option to set enabled runtime filter types, which
only affect Kudu scan node now. By default, both bloom filter and
min-max filter will be enabled for Kudu.
Added new test cases in PlannerTest and end-end runtime_filters test
for pushing down bloom filter to Kudu.
Updated bloom-filter-benchmark for the bloom-filter implementation
change.

Testing:
 - Passed test_kudu.py
 - Passed end-end test_runtime_filters.py.
 - Passed frontend Planner tests.
 - Ran single_node_perf_run.py on TPC-H with scale as 30 for parquet
   and Kudu. Verified that new hash function and bloom-filter
   implementation don't cause regressions for HDFS bloom filters.
 - Ran bloom-filter-benchmark and verified there is no regression
   due to bloom-filter implementation changes.

Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
---
M be/CMakeLists.txt
M be/src/benchmarks/bloom-filter-benchmark.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/filter-context.cc
M be/src/exec/kudu-scanner.cc
M be/src/runtime/raw-value-ir.cc
M be/src/runtime/raw-value.h
M be/src/runtime/raw-value.inline.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-bank.h
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter-test.cc
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
A be/src/util/bloom-filter.inline.h
M be/src/util/debug-util.cc
M be/src/util/debug-util.h
M be/src/util/hash-util.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
A testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test
M tests/query_test/test_runtime_filters.py
35 files changed, 1,472 insertions(+), 557 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/15683/9
--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 9
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-04-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..


Patch Set 8:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/5920/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 8
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 01 May 2020 00:11:25 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-04-30 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has uploaded a new patch set (#8). ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..

IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Defined the BloomFilter class as the wrapper of kudu::BlockBloomFilter.
impala::BloomFilter build runtime bloom filter in kudu::BlockBloomFilter
APIs with FastHash as default hash algorithm.
Removed the duplicated functions from impala::BloomFillter class.
Pushed down bloom filter to Kudu through Kudu clinet API.
Added a new query option to set enabled runtime filter types, which
only affect Kudu scan node now. By default, both bloom filter and
min-max filter will be enabled for Kudu.
Added new test cases in PlannerTest and end-end runtime_filters test
for pushing down bloom filter to Kudu.
Updated bloom-filter-benchmark for the bloom-filter implementation
change.
Upgraded Kudu version to 389d4f1e1 to pull in Kudu Bloom Filter support
which is needed for the Kudu/Impala Bloom Filter integration.

Testing:
 - Passed test_kudu.py
 - Passed end-end test_runtime_filters.py.
 - Passed frontend Planner tests.
 - Ran single_node_perf_run.py on TPC-H with scale as 30 for parquet
   and Kudu. Verified that new hash function and bloom-filter
   implementation don't cause regressions for HDFS bloom filters.
 - Ran bloom-filter-benchmark and verified there is no regression
   due to bloom-filter implementation changes.

Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
---
M be/CMakeLists.txt
M be/src/benchmarks/bloom-filter-benchmark.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/filter-context.cc
M be/src/exec/kudu-scanner.cc
M be/src/runtime/raw-value-ir.cc
M be/src/runtime/raw-value.h
M be/src/runtime/raw-value.inline.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-bank.h
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter-test.cc
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
A be/src/util/bloom-filter.inline.h
M be/src/util/debug-util.cc
M be/src/util/debug-util.h
M be/src/util/hash-util.h
M bin/impala-config.sh
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
A testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test
M tests/query_test/test_runtime_filters.py
36 files changed, 1,469 insertions(+), 560 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/15683/8
--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 8
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-04-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..


Patch Set 7:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/5877/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 7
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Sat, 25 Apr 2020 05:07:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-04-24 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has uploaded a new patch set (#7). ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..

IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Defined the BloomFilter class as the wrapper of Kudu BlockBloomFilter,
which build runtime bloom filter in Kudu BlockBloomFilter APIs with
FastHash as default hash algorithm. Removed the duplicated functions
from BloomFillter class.
Pushed down bloom filter to Kudu through Kudu clinet API.
Added a new query option to set enabled runtime filter types. By default,
both bloom filter and min-max filter will be enabled for Kudu.
Added new test cases in PlannerTest and end-end runtime_filters test
for pushing down bloom filter to Kudu.

Testing:
Passed end-end runtime filter tests.
Passed frontend Planner tests.

Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
---
M be/CMakeLists.txt
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/filter-context.cc
M be/src/exec/kudu-scanner.cc
M be/src/runtime/raw-value-ir.cc
M be/src/runtime/raw-value.h
M be/src/runtime/raw-value.inline.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-bank.h
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
A be/src/util/bloom-filter.inline.h
M be/src/util/debug-util.cc
M be/src/util/debug-util.h
M be/src/util/hash-util.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
A testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test
M tests/query_test/test_runtime_filters.py
33 files changed, 1,445 insertions(+), 522 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/15683/7
--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 7
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-04-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..


Patch Set 6:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/5860/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 6
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 23 Apr 2020 08:00:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-04-23 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has uploaded a new patch set (#6). ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..

IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Defined the BloomFilter class as the wrapper of Kudu BlockBloomFilter,
which build runtime bloom filter in Kudu BlockBloomFilter APIs with
FastHash as default hash algorithm. Removed the duplicated functions
from BloomFillter class.
Added a query option to set enabled runtime filter types. Pushed
down runtime filters to Kudu through Kudu client APIs.
Added new test cases in PlannerTest and end-end runtime_filters test
for pushing down bloom filter to Kudu.

Testing:
Passed end-end runtime filter tests.
Passed frontend Planner tests.

Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
---
M be/CMakeLists.txt
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/filter-context.cc
M be/src/exec/kudu-scanner.cc
M be/src/runtime/raw-value-ir.cc
M be/src/runtime/raw-value.h
M be/src/runtime/raw-value.inline.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-bank.h
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
A be/src/util/bloom-filter.inline.h
M be/src/util/debug-util.cc
M be/src/util/debug-util.h
M be/src/util/hash-util.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
A testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test
M tests/query_test/test_runtime_filters.py
29 files changed, 1,262 insertions(+), 365 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/15683/6
--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 6
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-04-23 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15683/4/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/15683/4/common/thrift/PlanNodes.thrift@124
PS4, Line 124: MIN_MAX = 2
> It's possible that the comma separated list thing is a bad idea and won't w
Query option setting is treated i as a string in impala-shell. We can set the 
option value with comma in impala-shell. But we will get parsing error in 
function impala::ParseQueryOptions() since comma is used as separator of a list 
of query option KV string, like "runtime_filter_min_size=1024, 
runtime_filter_max_size=65536". If we set "enabled_runtime_filter_type=bloom, 
min_max", the function parse the string as two KV string  
"enabled_runtime_filter_type=bloom," and "min_max". It get parsing error when 
parsing second KV string "min_max" since there is no "=" in the KV string.  As 
discussed offline, we will not support comma separated filter types. Instead, 
rename BLOOM_MIN_MAX to ALL.



--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 5
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 23 Apr 2020 06:37:39 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-04-20 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15683/4/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/15683/4/common/thrift/PlanNodes.thrift@124
PS4, Line 124: MIN_MAX = 2
> The code for showing the current settings in impala-shell are auto generate
It's possible that the comma separated list thing is a bad idea and won't work, 
but I'm not sure why the shell would be affected by it. I think it should be 
possible to just treat is as a string from the perspective of impala-shell, its 
only necessary to break it up into a list internally in Impala.



--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 5
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 20 Apr 2020 18:56:11 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-04-17 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..


Patch Set 5:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/15683/4/be/src/util/bloom-filter.h
File be/src/util/bloom-filter.h:

http://gerrit.cloudera.org:8080/#/c/15683/4/be/src/util/bloom-filter.h@80
PS4, Line 80: ublic:
> unnecessary
It's defined to support the "clone" virtual function in base class.


http://gerrit.cloudera.org:8080/#/c/15683/4/be/src/util/bloom-filter.h@91
PS4, Line 91:   std::shared_ptr 
Clone() const override {
> I don't understand what this is supposed to do - why wouldn't the cloned al
Yes, we cannot get the buffer pool from runtime filter bank without calling 
context.


http://gerrit.cloudera.org:8080/#/c/15683/4/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/15683/4/common/thrift/ImpalaService.thrift@521
PS4, Line 521:
> Seems a little confusing, since users don't specify the numbers they use "B
Fixed


http://gerrit.cloudera.org:8080/#/c/15683/4/common/thrift/ImpalaService.thrift@524
PS4, Line 524: R_TYP
> We'll eventually want to apply min-max filters in other places like HDFS sc
Fixed


http://gerrit.cloudera.org:8080/#/c/15683/4/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/15683/4/common/thrift/PlanNodes.thrift@124
PS4, Line 124: MIN_MAX = 2
> We probably eventually want to add in-list filters as well, and we won't wa
The code for showing the current settings in impala-shell are auto generated. 
Still try to figure out how to show the setting for enabled filter type as 
comma separated list of filter type.



--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 5
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 17 Apr 2020 18:57:49 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-04-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..


Patch Set 5:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/5817/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 5
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 17 Apr 2020 07:14:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-04-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15683/5/tests/query_test/test_runtime_filters.py
File tests/query_test/test_runtime_filters.py:

http://gerrit.cloudera.org:8080/#/c/15683/5/tests/query_test/test_runtime_filters.py@31
PS5, Line 31: from tests.common.test_dimensions import (
flake8: F401 'tests.common.test_dimensions.create_exec_option_dimension' 
imported but unused



--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 5
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 17 Apr 2020 07:04:15 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-04-17 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has uploaded a new patch set (#5). ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..

IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Defined the BloomFilter class as the wrapper of Kudu BlockBloomFilter,
which build runtime bloom filter in Kudu BlockBloomFilter APIs with
FastHash as default hash algorithm. Removed the duplicated functions
from BloomFillter class.
Added a query option to set enabled runtime filter types. Pushed
down runtime filters to Kudu through Kudu client APIs.
Added new test cases in PlannerTest and end-end runtime_filters test
for pushing down bloom filter to Kudu.

Testing:
Passed end-end runtime filter tests.
Passed frontend Planner tests.

Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
---
M be/CMakeLists.txt
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/filter-context.cc
M be/src/exec/kudu-scanner.cc
M be/src/runtime/raw-value-ir.cc
M be/src/runtime/raw-value.h
M be/src/runtime/raw-value.inline.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-bank.h
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
A be/src/util/bloom-filter.inline.h
M be/src/util/debug-util.cc
M be/src/util/debug-util.h
M be/src/util/hash-util.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
M tests/query_test/test_runtime_filters.py
28 files changed, 834 insertions(+), 362 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/15683/5
--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 5
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-04-10 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/15683/3/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/15683/3/common/thrift/PlanNodes.thrift@124
PS3, Line 124:   BLOOM_MIN_MAX = 2
> Never mind the question above 'In that case would the bloom filter be redun
Thanks for your comments. I will try to add some heuristic in planner when 
assign filter to Kudu scanner node. By default, we assign both types of filter 
to Kudu.


http://gerrit.cloudera.org:8080/#/c/15683/3/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test:

http://gerrit.cloudera.org:8080/#/c/15683/3/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test@667
PS3, Line 667: # RUNTIME_FILTER_SCHEME_KUDU is set as BLOOM, Bloom filter is 
assigned
> One additional useful test is with a SEMI-JOIN, since it is a common patter
Will add the suggested case.


http://gerrit.cloudera.org:8080/#/c/15683/3/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test@722
PS3, Line 722: RUNTIME_FILTER_SCHEME_KUDU=BLOOM_MIN_MAX
> Suggest setting the explain_level=2 such that the runtime filter shows what
Will change the explain_level=2 as suggested.



--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 10 Apr 2020 18:59:39 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-04-10 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..


Patch Set 3: Code-Review+1

(2 comments)

Couple of suggestions about the tests, otherwise the planner side changes LGTM. 
I did look at the backend changes but would be better for someone else to 
review it too.

http://gerrit.cloudera.org:8080/#/c/15683/3/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test:

http://gerrit.cloudera.org:8080/#/c/15683/3/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test@667
PS3, Line 667: # RUNTIME_FILTER_SCHEME_KUDU is set as BLOOM, Bloom filter is 
assigned
One additional useful test is with a SEMI-JOIN, since it is a common pattern :
SELECT COUNT(*) FROM alltypes a WHERE a.id IN (SELECT b.id FROM alltypes b 
WHERE b.int_col < 10);


http://gerrit.cloudera.org:8080/#/c/15683/3/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test@722
PS3, Line 722: RUNTIME_FILTER_SCHEME_KUDU=BLOOM_MIN_MAX
Suggest setting the explain_level=2 such that the runtime filter shows what 
type of filter it is 'bloom' or 'min-max'.  e.g RF000[bloom]



--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 10 Apr 2020 18:22:49 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-04-10 Thread Aman Sinha (Code Review)
Aman Sinha has removed a vote on this change.

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..


Removed Code-Review+1 by Aman Sinha 
--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-04-10 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15683/3/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/15683/3/common/thrift/PlanNodes.thrift@124
PS3, Line 124:   BLOOM_MIN_MAX = 2
> Thanks Bankim, Todd for the.illustrative explanation.   Agreed, for the (co
Never mind the question above 'In that case would the bloom filter be redundant 
?'.   I must have been thinking of the FALSE case for the first predicate.



--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 10 Apr 2020 14:49:34 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-04-09 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15683/3/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/15683/3/common/thrift/PlanNodes.thrift@124
PS3, Line 124:   BLOOM_MIN_MAX = 2
> BTW forgot that I took some measurements of the ApplyPredicatePrimitive per
Thanks Bankim, Todd for the.illustrative explanation.   Agreed, for the 
(compound) primary keys, the min-max predicate is well suited for range pruning 
- that was my thought too in my initial comment.  For #3, I didn't think it 
would provide much benefit but your numbers look compelling enough for the 
'with predicate' case that it seems worth it.  It sounds like Todd, you would 
even prefer evaluating that first. In that case, would the bloom filter be 
redundant ?  It sounds like current behavior (from Bankim's comment) is the BF 
is evaluated first.

In any case, yeah on further thought, the overhead of sending both is not too 
much and we could leave the decision making to Kudu.



--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 10 Apr 2020 05:47:36 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-04-09 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..


Patch Set 3:

Sorry for repeated comments, but one more thought occurred me to me: the most 
effective case of min-max filter is when min == max, and thus the filter 
becomes an equality. An equality predicate can then serve to prune Kudu hash 
partitions, which isn't doable with bloom or range predicates.


--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 10 Apr 2020 04:12:06 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-04-09 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15683/3/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/15683/3/common/thrift/PlanNodes.thrift@124
PS3, Line 124:   BLOOM_MIN_MAX = 2
> Right, sortedness in kudu is based on the (compound) primary key. So, min-m
BTW forgot that I took some measurements of the ApplyPredicatePrimitive 
performance bfeore:

 int8   NOT NULL   (c >= 0 AND c < 2) 3152.2M evals/sec 0.88 cycles/eval
 int16  NOT NULL   (c >= 0 AND c < 2) 3309.6M evals/sec 0.85 cycles/eval
 int32  NOT NULL   (c >= 0 AND c < 2) 3384.0M evals/sec 0.85 cycles/eval
 int64  NOT NULL   (c >= 0 AND c < 2) 1847.6M evals/sec 1.57 cycles/eval
 float  NOT NULL   (c >= 0 AND c < 2) 3268.3M evals/sec 0.88 cycles/eval
 double NOT NULL   (c >= 0 AND c < 2) 2245.2M evals/sec 1.27 cycles/eval



--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 10 Apr 2020 03:56:13 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-04-09 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15683/3/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/15683/3/common/thrift/PlanNodes.thrift@124
PS3, Line 124:   BLOOM_MIN_MAX = 2
> This is my understanding, I'll check further within Kudu team.
Right, sortedness in kudu is based on the (compound) primary key. So, min-max 
predicates can help in the following ways:
(1) eliminate range partitions -- eg a min/max on a timestamp could would help 
a lot if the table is timestamp-partitioned by day, whereas a bloom filter on 
timestamp is probably not useful since cardinality is very high

(2) convert to primary key-based range scans -- if we have inequality/equality 
predicates on a prefix of the primary key, we can convert those predicates into 
a range scan on the table using b-tree style indexes. Again if you had a 
min-max on timestamp and timestamp were the leading portion of your key, it's a 
big win. Or, if you had a min-max on timestamp, a composite primary key of 
(entity_id, timestamp), and an equality predicate on entity_id, it would still 
do the PK range scan conversion for a big win.

(3) normal row-by-row evaluation - this is still useful since, particularly for 
primitive columns, it's very fast to evaluate (SIMD-accelerated in latest 
release, etc). It's likely much faster to evaluate than a bloom filter which 
will probably have some frequency of L1 cache misses if not L2/LLC, plus the 
hash calculation and mixing itself which likely takes a couple cycles. So, 
you're right that it may not eliminate much data in some queries, but the cost 
might be small enough that it's worth doing even if it eliminates 10-20%.

To validate this I loaded 800M rows into a table using 'kudu perf loadgen' and 
then compared scanning with

$ :./build/latest/bin/kudu perf table_scan localhost 
default.loadgen_auto_0b01a9eba6d54f2ba7fa5b59e3a597c5 --columns=int_val 
--num-threads=10 --predicates ["AND", [">=", "int_val", 0]]

vs the same without the predicate. In this case the predicate matches all rows, 
but there's no special short-circuit logic, so it actually evaluates it 
everywhere. Wall clock times:

with the predicate: 1.7023 +- 0.0266 seconds time elapsed  ( +-  
1.56% )
without the predicate: 1.6514 +- 0.0370 seconds time elapsed  ( +-  
2.24% )

Also running perf record -a, I can see that about 5% of the total CPU was used 
on the tserver in ApplyPredicatePrimitive.

Admittedly, this overhead will go up on a percentage basis after some other 
in-flight work we have is finished (eg more CPU-efficient integer encoding 
bankim's working on) but I think we also have more room to optimize 
ApplyPredicatePrimitive by using AVX2 instead of just SSE4 instructions when 
available.

As for the overhead of sending to the coordinator and broadcasting, isn't this 
a _tiny_ amount of data? ie it's only two values, which for primitives is going 
to be a max of 16 bytes each, and for varlen, you could feasibly truncate to a 
fixed size bound even in the case of a long string.

So, I'm generally in favor of sending and evaluating both. In the best case 
it's a huge win (eliminating range partitions) and in the worst case it's 
probably not a major cost.

One further thought here as a future extension: we could provide an API in the 
Kudu scanner that says "this predicate is optional: drop it if you think it's 
not filtering much". For the case of these propagated predicates feeding into a 
join, dropping them at runtime if they are ineffective is worthwhile (I think 
Impala already does something like this in its scanner path, right?)



--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 10 Apr 2020 03:49:11 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-04-09 Thread Bankim Bhavsar (Code Review)
Bankim Bhavsar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15683/3/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/15683/3/common/thrift/PlanNodes.thrift@124
PS3, Line 124:   BLOOM_MIN_MAX = 2
> Instead of 'are not sorted(partially sorted within each tablet)',  I meant
This is my understanding, I'll check further within Kudu team.

When the predicate pushed to Kudu includes partition key columns then pruning 
using min-max filter will be done first eliminating bunch of files containing 
column values namely cfiles.

When the predicate doesn't include partition key columns then all column values 
will need to be scanned and as per current implementation for every column 
value first Bloom filter will be checked before checking for range/min-max 
values[1]. So in case of non-partition key columns whether min-max filter will 
help depends upon whether min-max filter values are simply min and max of the 
values inserted in the Bloom filter or not derived from the values inserted in 
the Bloom filter. For the former case, I agree supplying min-max filter with 
Bloom filter won't help.

[1] 
https://github.com/apache/kudu/blob/master/src/kudu/common/column_predicate.h#L332



--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 10 Apr 2020 03:23:19 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-04-09 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15683/3/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/15683/3/common/thrift/PlanNodes.thrift@124
PS3, Line 124:   BLOOM_MIN_MAX = 2
> In the case of both Bloom and Min-Max filters, is the order of evaluation d
Instead of 'are not sorted(partially sorted within each tablet)',  I meant that 
even within a tablet there's no sortedness of that column.



--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 10 Apr 2020 02:19:32 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-04-09 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15683/3/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/15683/3/common/thrift/PlanNodes.thrift@124
PS3, Line 124:   BLOOM_MIN_MAX = 2
In the case of both Bloom and Min-Max filters, is the order of evaluation 
determined by Kudu ?  For columns that are not sorted (partially sorted within 
each tablet), the Min-Max filters could potentially not eliminate much and be 
wasted effort, so Bloom filter should be evaluated first.  On the other hand, 
for sorted columns,  it would make sense to apply Min-Max filters first since 
Bloom filter has false positives.

It seems that for a specific column, creating two types of filters would incur 
overhead during execution - since both have to be sent to coordinator and 
broadcast to executors.  Any thoughts on that ?



--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 10 Apr 2020 02:08:39 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-04-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..


Patch Set 3:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/5749/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Wed, 08 Apr 2020 06:20:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-04-08 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..

IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Defined the BloomFilter class as the wrapper of Kudu BlockBloomFilter,
which build runtime bloom filter in Kudu BlockBloomFilter APIs with
FastHash as default hash algorithm. Removed the duplicated functions
from BloomFillter class.
Added a query option to set runtime filter scheme for Kudu. Pushed
down runtime filters to Kudu through Kudu client APIs.
Added new test cases in PlannerTest and end-end runtime_filters test
for pushing down bloom filter to Kudu.

Testing:
Passed end-end runtime filter tests with codegen disabled.
Passed frontend Planner tests.

Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
---
M be/CMakeLists.txt
M be/src/exec/filter-context.cc
M be/src/exec/kudu-scanner.cc
M be/src/runtime/raw-value.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-bank.h
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
A be/src/util/bloom-filter.inline.h
M be/src/util/debug-util.cc
M be/src/util/debug-util.h
M be/src/util/hash-util.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
M tests/query_test/test_runtime_filters.py
24 files changed, 624 insertions(+), 360 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/15683/3
--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-04-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..


Patch Set 2:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/5748/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 2
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Wed, 08 Apr 2020 05:08:06 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-04-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/15683/2/be/src/exec/kudu-scanner.cc
File be/src/exec/kudu-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/15683/2/be/src/exec/kudu-scanner.cc@269
PS2, Line 269: const ColumnType& col_type = 
ColumnType::FromThrift(target_desc.kudu_col_type);
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/15683/2/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

http://gerrit.cloudera.org:8080/#/c/15683/2/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@713
PS2, Line 713:   // Kudu only supports targeting a single column, not 
general exprs, so the target
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/15683/2/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@729
PS2, Line 729:   // Kudu only supports targeting a single column, not 
general exprs, so the target
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/15683/2/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@732
PS2, Line 732:   // Kudu also cannot currently return nulls if a filter 
is applied, so it does not
line too long (91 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 2
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Wed, 08 Apr 2020 04:50:56 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

2020-04-07 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/15683


Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
..

IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Defined the BloomFilter class as the wrapper of Kudu BlockBloomFilter,
which build runtime bloom filter in Kudu BlockBloomFilter APIs with
FastHash as default hash algorithm. Removed the duplicated functions
from BloomFillter class.
Added a query option to set runtime filter scheme for Kudu. Pushed
down runtime filters to Kudu through Kudu client APIs.
Added new test cases in PlannerTest and end-end runtime_filters test
for pushing down bloom filter to Kudu.

Testing:
Passed end-end runtime filter tests with codegen disabled.
Passed frontend Planner tests.

Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
---
M be/CMakeLists.txt
M be/src/exec/filter-context.cc
M be/src/exec/kudu-scanner.cc
M be/src/runtime/raw-value.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-bank.h
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
A be/src/util/bloom-filter.inline.h
M be/src/util/debug-util.cc
M be/src/util/debug-util.h
M be/src/util/hash-util.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
M tests/query_test/test_runtime_filters.py
24 files changed, 623 insertions(+), 360 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/15683/2
--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 2
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Thomas Tauber-Marshall