Tim Armstrong has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/13857 )
Change subject: IMPALA-8549: Add support for scanning DEFLATE text files ...................................................................... IMPALA-8549: Add support for scanning DEFLATE text files This patch adds support to Impala for scanning .DEFLATE files of tables stored as text. To avoid confusion, it should be noted that although these files have a compression type of DEFLATE in Impala, they should be treated as if their compression type is DEFAULT. Hadoop tools such as Hive and MapReduce support reading and writing text files compressed using the deflate algorithm, which is the default compression type. Hadoop uses the zlib library (an implementation of the DEFLATE algorithm) to compress text files into .DEFLATE files, which are not in the raw deflate format but rather the zlib format (the zlib library supports three flavors of deflate, and Hadoop uses the flavor that compresses data into deflate with zlib wrappings rather than just raw deflate) Testing: There is a pre-existing unit test that validates compressing and decompressing data with compression type DEFLATE. Also, modified existing end-to-end testing that simulates querying files of various formats and compression types. All core and exhaustive tests pass. Change-Id: I45e41ab5a12637d396fef0812a09d71fa839b27a Reviewed-on: http://gerrit.cloudera.org:8080/13857 Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Reviewed-by: Tim Armstrong <tarmstr...@cloudera.com> --- M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-query/functional-query_exhaustive.csv M tests/query_test/test_compressed_formats.py 5 files changed, 25 insertions(+), 19 deletions(-) Approvals: Impala Public Jenkins: Verified Tim Armstrong: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/13857 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I45e41ab5a12637d396fef0812a09d71fa839b27a Gerrit-Change-Number: 13857 Gerrit-PatchSet: 12 Gerrit-Owner: Ethan Xue <ethan....@cloudera.com> Gerrit-Reviewer: Abhishek Rawat <ara...@cloudera.com> Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com> Gerrit-Reviewer: Ethan Xue <ethan....@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Sahil Takiar <stak...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>