Daniel Becker has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/17262 )
Change subject: WIP - IMPALA-10642: Write support for Parquet Bloom filters - most common types ...................................................................... WIP - IMPALA-10642: Write support for Parquet Bloom filters - most common types This change adds support for writing Parquet Bloom filters for the types for which read support was added in IMPALA-10640. Writing of Parquet Bloom filters can be controlled by the 'parquet_bloom_filter_write' query option which has the following possible values: NEVER - never write Parquet Bloom filters TBL_PROPS - write Parquet Bloom filters as set in table properties IF_NO_DICT - write Parquet Bloom filters if the row group is not fully dictionary encoded ALWAYS - always write Parquet Bloom filters, even if the row group is fully dictionary encoded TODO: Implement table properties involving Parquet Bloom filters. TODO: Decide size of Parquet Bloom filter based on NDV heuristics or configuration. Testing: - Added a test in tests/query_test/test_parquet_bloom_filter.py that uses Impala to write the same table as in the test file 'testdata/data/parquet-bloom-filtering.parquet' and checks whether the Parquet Bloom filter header and bitset are identical. Change-Id: Ie865efd4f0c11b9e111fb94f77d084bf6ee20792 --- M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-table-writer.cc M be/src/exec/parquet/hdfs-parquet-table-writer.h M be/src/exec/parquet/parquet-bloom-filter-util.cc M be/src/exec/parquet/parquet-bloom-filter-util.h M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/debug-util.cc M be/src/util/debug-util.h M be/src/util/dict-encoding.h M be/src/util/parquet-bloom-filter-test.cc M be/src/util/parquet-bloom-filter.cc M be/src/util/parquet-bloom-filter.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M tests/query_test/test_parquet_bloom_filter.py 16 files changed, 486 insertions(+), 30 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/62/17262/6 -- To view, visit http://gerrit.cloudera.org:8080/17262 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie865efd4f0c11b9e111fb94f77d084bf6ee20792 Gerrit-Change-Number: 17262 Gerrit-PatchSet: 6 Gerrit-Owner: Daniel Becker <daniel.bec...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>