Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/16278 )
Change subject: IMPALA-10005: Fix Snappy decompression for non-block filesystems ...................................................................... Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/16278/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16278/2//COMMIT_MSG@9 PS2, Line 9: Snappy-compressed text always uses THdfsCompression::SNAPPY_BLOCKED > is this true of all engines? do all engines use SNAPPY_BLOCKED for snappy-c Under the covers, we were already using SNAPPY_BLOCKED before. All our Snappy text test tables are written by Hive, and Hive uses SNAPPY_BLOCKED. Our text scanner has an assert that verifies that we don't pass THdfsCompression::SNAPPY into it. In the old code, the way this worked is that we translated SNAPPY to THdfsCompression::SNAPPY_BLOCKED when we converted to thrift here: https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java#L79 The non-block filesystems don't go through that Thrift codepath. They go through a flatbuffers codepath, which inadvertently didn't do the conversion from SNAPPY to SNAPPY_BLOCKED. https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java#L93 Long story short: the behavior should be the same as before (except non-block is fixed) http://gerrit.cloudera.org:8080/#/c/16278/2//COMMIT_MSG@10 PS2, Line 10: for non-block filesystems > why does this only happen for non-block filesystems? Rolled the answer to this into the other comment. http://gerrit.cloudera.org:8080/#/c/16278/2//COMMIT_MSG@24 PS2, Line 24: Changed the utility function so it doesn't use Hive. > adding coverage for S3 is nice, but do we lose any inter-operability covera The Hive statements are "create table like" statements and "drop table" statements, which are purely metadata. The compression codec is not part of the metadata. I don't think we lose any coverage that we don't have covered by other tests. The data that we are using was already written by Hive during dataload (i.e. we aren't writing data with Hive in this test). -- To view, visit http://gerrit.cloudera.org:8080/16278 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0879f2fc0bf75bb5c15cecb845ece46a901601ac Gerrit-Change-Number: 16278 Gerrit-PatchSet: 2 Gerrit-Owner: Joe McDonnell <[email protected]> Gerrit-Reviewer: Fang-Yu Rao <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Sahil Takiar <[email protected]> Gerrit-Comment-Date: Thu, 06 Aug 2020 20:02:52 +0000 Gerrit-HasComments: Yes
