Xiaomeng Zhang has uploaded a new patch set (#5). ( 
http://gerrit.cloudera.org:8080/15023 )

Change subject: IMPALA-9075: Add support for reading zstd text files
......................................................................

IMPALA-9075: Add support for reading zstd text files

In this patch, we add support for reading zstd encoded text files.
This includes:
1. support reading zstd file written by Hive which uses streaming.
2. support reading zstd file compressed by standard zstd library which
uses block.
To support decompressing both formats, a function ProcessBlockStreaming
is added in zstd decompressor.

Testing done:
Added two backend tests:
1. streaming decompress test.
2. large data test for both block and streaming decompress.
Added two end to end tests:
1. hive and impala integration. For four compression codecs, write in
hive and read from impala.
2. zstd library and impala integration. Copy a zstd lib compressed file
to HDFS, and read from impala.

Change-Id: I2adce9fe00190558525fa5cd3d50cf5e0f0b0aa4
---
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-scanner.h
M be/src/util/compress.h
M be/src/util/decompress-test.cc
M be/src/util/decompress.cc
M be/src/util/decompress.h
M bin/rat_exclude_files.txt
A testdata/data/text_large_zstd.zst
A tests/custom_cluster/test_hive_text_codec_interop.py
M tests/query_test/test_compressed_formats.py
10 files changed, 247 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/15023/5
--
To view, visit http://gerrit.cloudera.org:8080/15023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2adce9fe00190558525fa5cd3d50cf5e0f0b0aa4
Gerrit-Change-Number: 15023
Gerrit-PatchSet: 5
Gerrit-Owner: Xiaomeng Zhang <xiaom...@cloudera.com>
Gerrit-Reviewer: Abhishek Rawat <ara...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <asher...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Xiaomeng Zhang <xiaom...@cloudera.com>

Reply via email to