Impala Public Jenkins has submitted this change and it was merged. (
http://gerrit.cloudera.org:8080/13507 )
Change subject: IMPALA-8450: Add support for zstd in parquet
......................................................................
IMPALA-8450: Add support for zstd in parquet
Makefile was updated to include zstd in the ${IMPALA_HOME}/toolchain
directory. Other changes were made to make zstd headers and libs
accessible.
Class ZstandardCompressor/ZstandardDecompressor was added to provide
interfaces for calling ZSTD_compress/ZSTD_decompress functions. Zstd
supports different compression levels (clevel) from 1 to
ZSTD_maxCLevel(). Zstd also supports -ive clevels, but since the -ive
values represents uncompressed data they won't be supported. The default
clevel is ZSTD_CLEVEL_DEFAULT.
HdfsParquetTableWriter was updated to support ZSTD codec. The
new codecs can be set using existing query option as follows:
set COMPRESSION_CODEC=ZSTD:<clevel>;
set COMPRESSION_CODEC=ZSTD; // uses ZSTD_CLEVEL_DEFAULT
Testing:
- Added unit test in DecompressorTest class with ZSTD_CLEVEL_DEFAULT
clevel and a random clevel. The test unit decompresses an input
compressed data and validates the result. It also tests for
expected behavior when passing an over/under sized buffer for
decompressing.
- Added unit tests for valid/invalid values for COMPRESSION_CODEC.
- Added e2e test in test_insert_parquet.py which tests writing/read-
ing (null/non-null) data into/from a table (w different data type
columns) using multiple codecs. Other existing e2e tests were
updated to also use parquet/zstd table format.
- Manual interoperability tests were run between Impala and Hive.
Change-Id: Id2c0e26e6f7fb2dc4024309d733983ba5197beb7
Reviewed-on: http://gerrit.cloudera.org:8080/13507
Reviewed-by: Tim Armstrong <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/src/catalog/catalog-util.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/parquet/parquet-common.cc
M be/src/exec/parquet/parquet-metadata-utils.cc
M be/src/experiments/compression-test.cc
M be/src/service/child-query.cc
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/util/codec.cc
M be/src/util/codec.h
M be/src/util/compress.cc
M be/src/util/compress.h
M be/src/util/decompress-test.cc
M be/src/util/decompress.cc
M be/src/util/decompress.h
M be/src/util/runtime-profile.cc
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
A cmake_modules/FindZstd.cmake
M common/thrift/CatalogObjects.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/generate_error_codes.py
A
testdata/workloads/functional-query/queries/QueryTest/insert_parquet_multi_codecs.test
M testdata/workloads/functional-query/queries/QueryTest/set.test
M tests/common/test_dimensions.py
M tests/query_test/test_insert.py
M tests/query_test/test_insert_parquet.py
30 files changed, 497 insertions(+), 98 deletions(-)
Approvals:
Tim Armstrong: Looks good to me, approved
Impala Public Jenkins: Verified
--
To view, visit http://gerrit.cloudera.org:8080/13507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Id2c0e26e6f7fb2dc4024309d733983ba5197beb7
Gerrit-Change-Number: 13507
Gerrit-PatchSet: 7
Gerrit-Owner: Abhishek Rawat <[email protected]>
Gerrit-Reviewer: Abhishek Rawat <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Todd Lipcon <[email protected]>