Abhishek Rawat has uploaded a new patch set (#7). ( 
http://gerrit.cloudera.org:8080/13396 )

Change subject: IMPALA-8450: Add support for zstd and lz4 in parquet
......................................................................

IMPALA-8450: Add support for zstd and lz4 in parquet

Makefile was updated to include zstd in the ${IMPALA_HOME}/toolchain
directory. Other changes were made to make zstd headers and libs
accessible.

Class ZstandardCompressor/ZstandardDecompressor was added to provide
interfaces for calling ZSTD_compress/ZSTD_decompress functions. Zstd
supports different compression levels (clevel) from 1 to
ZSTD_maxCLevel(). Zstd also supports -ive clevels, but since the -ive
values represents uncompressed data they won't be supported. A new query
option COMPRESSION_LEVEL was added so that user can set appropriate
clevel. The new query option is a no-op for other codecs. A generic name
was used since there is scope for adding support for clevels for various
other codecs. The default clevel is ZSTD_CLEVEL_DEFAULT.

HdfsParquetTableWriter was updated to support LZ4 and ZSTD codecs. The
new codecs can be set using existing query option as follows:
  set COMPRESSION_CODEC=LZ4;
  set COMPRESSION_CODEC=ZSTD;

Testing:
  - Added unit test in DecompressorTest class with ZSTD_CLEVEL_DEFAULT
    clevel and a random clevel. The test unit decompresses an input
    compressed data and validates the result. It also tests for
    expected behavior when passing an over/under sized buffer for
    decompressing.
  - Added unit tests for the new query option - COMPRESSION_LEVEL.
  - Added e2e test in test_insert_parquet.py which tests writing/read-
    ing (null/non-null) data into/from a table (w different data type
    columns) using multiple codecs. Other existing e2e tests were
    updated to also use parquet/lz4 and parquet/zstd table format.

Change-Id: I98c6dcf3d0a873380e4fa4cf03eb7e924e4ee768
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/src/catalog/catalog-util.cc
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/parquet/parquet-common.cc
M be/src/exec/parquet/parquet-metadata-utils.cc
M be/src/experiments/compression-test.cc
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/codec.cc
M be/src/util/codec.h
M be/src/util/compress.cc
M be/src/util/compress.h
M be/src/util/decompress-test.cc
M be/src/util/decompress.cc
M be/src/util/decompress.h
M be/src/util/runtime-profile.cc
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
A cmake_modules/FindZstd.cmake
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
A 
testdata/workloads/functional-query/queries/QueryTest/insert_parquet_multi_codecs.test
M tests/common/test_dimensions.py
M tests/query_test/test_insert.py
M tests/query_test/test_insert_parquet.py
27 files changed, 400 insertions(+), 99 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/13396/7
--
To view, visit http://gerrit.cloudera.org:8080/13396
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I98c6dcf3d0a873380e4fa4cf03eb7e924e4ee768
Gerrit-Change-Number: 13396
Gerrit-PatchSet: 7
Gerrit-Owner: Abhishek Rawat <ara...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>

Reply via email to