Tim Armstrong has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/13857 )

Change subject: IMPALA-8549: Add support for scanning DEFLATE text files
......................................................................

IMPALA-8549: Add support for scanning DEFLATE text files

This patch adds support to Impala for scanning .DEFLATE files of
tables stored as text. To avoid confusion, it should be noted that
although these files have a compression type of DEFLATE in Impala,
they should be treated as if their compression type is DEFAULT.

Hadoop tools such as Hive and MapReduce support reading and writing
text files compressed using the deflate algorithm, which is the default
compression type. Hadoop uses the zlib library (an implementation of
the DEFLATE algorithm) to compress text files into .DEFLATE files,
which are not in the raw deflate format but rather the zlib format
(the zlib library supports three flavors of deflate, and Hadoop uses
the flavor that compresses data into deflate with zlib wrappings rather
than just raw deflate)

Testing:
There is a pre-existing unit test that validates compressing and
decompressing data with compression type DEFLATE. Also, modified
existing end-to-end testing that simulates querying files of various
formats and compression types. All core and exhaustive tests pass.

Change-Id: I45e41ab5a12637d396fef0812a09d71fa839b27a
Reviewed-on: http://gerrit.cloudera.org:8080/13857
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstr...@cloudera.com>
---
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-scanner.h
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-query/functional-query_exhaustive.csv
M tests/query_test/test_compressed_formats.py
5 files changed, 25 insertions(+), 19 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Tim Armstrong: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/13857
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I45e41ab5a12637d396fef0812a09d71fa839b27a
Gerrit-Change-Number: 13857
Gerrit-PatchSet: 12
Gerrit-Owner: Ethan Xue <ethan....@cloudera.com>
Gerrit-Reviewer: Abhishek Rawat <ara...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com>
Gerrit-Reviewer: Ethan Xue <ethan....@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <stak...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>

Reply via email to