Xiaomeng Zhang has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/15023 )
Change subject: IMPALA-9075: Add support for reading zstd text files ...................................................................... IMPALA-9075: Add support for reading zstd text files In this patch, we add support for reading zstd encoded text files. This includes: 1. support reading zstd file written by Hive which uses streaming. 2. support reading zstd file compressed by standard zstd library which uses block. To support decompressing both formats, a function ProcessBlockStreaming is added in zstd decompressor. Testing done: Added two backend tests: 1. streaming decompress test. 2. large data test for both block and streaming decompress. Added two end to end tests: 1. hive and impala integration. For four compression codecs, write in hive and read from impala. 2. zstd library and impala integration. Copy a zstd lib compressed file to HDFS, and read from impala. Change-Id: I2adce9fe00190558525fa5cd3d50cf5e0f0b0aa4 --- M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h M be/src/util/compress.h M be/src/util/decompress-test.cc M be/src/util/decompress.cc M be/src/util/decompress.h M bin/rat_exclude_files.txt A testdata/data/text_large_zstd.txt A testdata/data/text_large_zstd.zst A tests/custom_cluster/test_hive_text_codec_interop.py M tests/query_test/test_compressed_formats.py 11 files changed, 10,000,278 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/15023/7 -- To view, visit http://gerrit.cloudera.org:8080/15023 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2adce9fe00190558525fa5cd3d50cf5e0f0b0aa4 Gerrit-Change-Number: 15023 Gerrit-PatchSet: 7 Gerrit-Owner: Xiaomeng Zhang <xiaom...@cloudera.com> Gerrit-Reviewer: Abhishek Rawat <ara...@cloudera.com> Gerrit-Reviewer: Andrew Sherman <asher...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Xiaomeng Zhang <xiaom...@cloudera.com>