Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16278 )

Change subject: IMPALA-10005: Fix Snappy decompression for non-block filesystems
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/16278/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16278/2//COMMIT_MSG@9
PS2, Line 9: Snappy-compressed text always uses THdfsCompression::SNAPPY_BLOCKED
> is this true of all engines? do all engines use SNAPPY_BLOCKED for snappy-c
Under the covers, we were already using SNAPPY_BLOCKED before. All our Snappy 
text test tables are written by Hive, and Hive uses SNAPPY_BLOCKED. Our text 
scanner has an assert that verifies that we don't pass THdfsCompression::SNAPPY 
into it.

In the old code, the way this worked is that we translated SNAPPY to 
THdfsCompression::SNAPPY_BLOCKED when we converted to thrift here:
https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java#L79

The non-block filesystems don't go through that Thrift codepath. They go 
through a flatbuffers codepath, which inadvertently didn't do the conversion 
from SNAPPY to SNAPPY_BLOCKED.
https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java#L93

Long story short: the behavior should be the same as before (except non-block 
is fixed)


http://gerrit.cloudera.org:8080/#/c/16278/2//COMMIT_MSG@10
PS2, Line 10: for non-block filesystems
> why does this only happen for non-block filesystems?
Rolled the answer to this into the other comment.


http://gerrit.cloudera.org:8080/#/c/16278/2//COMMIT_MSG@24
PS2, Line 24: Changed the utility function so it doesn't use Hive.
> adding coverage for S3 is nice, but do we lose any inter-operability covera
The Hive statements are "create table like" statements and "drop table" 
statements, which are purely metadata. The compression codec is not part of the 
metadata.

I don't think we lose any coverage that we don't have covered by other tests.

The data that we are using was already written by Hive during dataload (i.e. we 
aren't writing data with Hive in this test).



--
To view, visit http://gerrit.cloudera.org:8080/16278
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0879f2fc0bf75bb5c15cecb845ece46a901601ac
Gerrit-Change-Number: 16278
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Fang-Yu Rao <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Sahil Takiar <[email protected]>
Gerrit-Comment-Date: Thu, 06 Aug 2020 20:02:52 +0000
Gerrit-HasComments: Yes

Reply via email to