Daniel Becker has uploaded a new patch set (#6). ( 
http://gerrit.cloudera.org:8080/17262 )

Change subject: WIP - IMPALA-10642: Write support for Parquet Bloom filters - 
most common types
......................................................................

WIP - IMPALA-10642: Write support for Parquet Bloom filters - most common types

This change adds support for writing Parquet Bloom filters for the types
for which read support was added in IMPALA-10640.

Writing of Parquet Bloom filters can be controlled by the
'parquet_bloom_filter_write' query option which has the following
possible values:
  NEVER      - never write Parquet Bloom filters
  TBL_PROPS  - write Parquet Bloom filters as set in table properties
  IF_NO_DICT - write Parquet Bloom filters if the row group is not
               fully dictionary encoded
  ALWAYS     - always write Parquet Bloom filters, even if the row
               group is fully dictionary encoded

TODO: Implement table properties involving Parquet Bloom filters.

TODO: Decide size of Parquet Bloom filter based on NDV heuristics or
configuration.

Testing:
  - Added a test in tests/query_test/test_parquet_bloom_filter.py that
    uses Impala to write the same table as in the test file
    'testdata/data/parquet-bloom-filtering.parquet' and checks whether the
    Parquet Bloom filter header and bitset are identical.

Change-Id: Ie865efd4f0c11b9e111fb94f77d084bf6ee20792
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/parquet/hdfs-parquet-table-writer.h
M be/src/exec/parquet/parquet-bloom-filter-util.cc
M be/src/exec/parquet/parquet-bloom-filter-util.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/debug-util.cc
M be/src/util/debug-util.h
M be/src/util/dict-encoding.h
M be/src/util/parquet-bloom-filter-test.cc
M be/src/util/parquet-bloom-filter.cc
M be/src/util/parquet-bloom-filter.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M tests/query_test/test_parquet_bloom_filter.py
16 files changed, 486 insertions(+), 30 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/62/17262/6
--
To view, visit http://gerrit.cloudera.org:8080/17262
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie865efd4f0c11b9e111fb94f77d084bf6ee20792
Gerrit-Change-Number: 17262
Gerrit-PatchSet: 6
Gerrit-Owner: Daniel Becker <daniel.bec...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>

Reply via email to