Daniel Becker has uploaded a new patch set (#15). ( http://gerrit.cloudera.org:8080/17026 )
Change subject: IMPALA-9470: Use Parquet Bloom filters - Part 1 ...................................................................... IMPALA-9470: Use Parquet Bloom filters - Part 1 This change adds read support for Parquet Bloom filters for some types. The supported Parquet type - Impala type pairs are the following: --------------------------------------- |Parquet type | Impala type | |---------------------------------------| |INT32 | TINYINT, SMALLINT, INT | |INT64 | BIGINT | |FLOAT | FLOAT | |DOUBLE | DOUBLE | |BYTE_ARRAY | STRING | --------------------------------------- If a Bloom filter is available for a column that is fully dictionary encoded, the Bloom filter is not used as the dictionary can give exact results in filtering. Testing: - Added tests/query_test/test_parquet_bloom_filter.py that tests that Parquet Bloom filtering works for the supported types and that we do not incorrectly discard row groups for the unsupported type VARCHAR. Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287 --- M LICENSE.txt M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.h M be/src/exprs/expr-value.h M be/src/exprs/literal.cc M be/src/exprs/literal.h M be/src/kudu/util/block_bloom_filter.cc M be/src/kudu/util/block_bloom_filter.h M be/src/runtime/bufferpool/buffer-pool-internal.h M be/src/runtime/bufferpool/buffer-pool.cc M be/src/runtime/bufferpool/buffer-pool.h A be/src/thirdparty/xxhash/README.md A be/src/thirdparty/xxhash/xxhash.h M be/src/util/CMakeLists.txt M be/src/util/bloom-filter.cc M be/src/util/bloom-filter.h A be/src/util/impala-bloom-filter-buffer-allocator.cc A be/src/util/impala-bloom-filter-buffer-allocator.h A be/src/util/parquet-bloom-filter.cc A be/src/util/parquet-bloom-filter.h M bin/rat_exclude_files.txt M bin/run_clang_tidy.sh M common/thrift/parquet.thrift A testdata/data/parquet-bloom-filtering.parquet A testdata/workloads/functional-query/queries/QueryTest/parquet-bloom-filter.test A tests/query_test/test_parquet_bloom_filter.py 27 files changed, 6,848 insertions(+), 123 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/17026/15 -- To view, visit http://gerrit.cloudera.org:8080/17026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287 Gerrit-Change-Number: 17026 Gerrit-PatchSet: 15 Gerrit-Owner: Daniel Becker <daniel.bec...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>