Alex Behm has posted comments on this change.

Change subject: IMPALA-5250: Unify decompressor output_length semantics
......................................................................


Patch Set 1:

(18 comments)

http://gerrit.cloudera.org:8080/#/c/8030/1//COMMIT_MSG
Commit Message:

Line 9: This patch makes the semantics output_length parameter in
the semantics of the output_length


Line 10: Codec::ProcessBlock to be the same. In existing code different
Codec::ProcessBlock() to be the same across all codecs. In the existing code 
different decompressors treat output_length differently.


Line 13:    to actual decompressed length, but it does not set it to actual
to the actual decompressed length


Line 16:    be exactly the same as actual decompressed length, otherwise
as the actual decompressed length


Line 19:    actual decompressed length and will set it to actual decompressed
the actual decompressed length


Line 21: This inconsistency leads to a bug where the error message is
This inconsistency lead to a bug where ....


Line 24: decompressors can handle oversized output buffer correctly.
can handle an oversized output buffer correctly.


Line 25: Lz4Decompressor will use the "safe" version of decompression function
will use the "safe" instead of the "fast" decompression function


Line 26: instead of the "fast" version, for the latter is insecure with 
corrupted
We use LZ4 for compressing exchange data, so we check the perf impact of these 
changes. I can help you with the details.


Line 27: data and requires decompressed length to be known.
and requires the decompressed length to be known


http://gerrit.cloudera.org:8080/#/c/8030/1/be/src/util/decompress-test.cc
File be/src/util/decompress-test.cc:

Line 153:   // Test the behavior when the decompressor is given too little / 
too much space
"." at end of sentence


Line 155:   // correct output size when the space is enough, and does not write 
beyond the output
the correct output size


Line 157:   void DecompressOverUnderSizedOutputBuffer(Codec *compressor, Codec 
*decompressor,
Out style is "Codec* compressor"


Line 196:     }
I think we should test more directly the scenario mentioned in the JIRA where a 
corrupt file can lead to the output_len not being set to 0 properly.


http://gerrit.cloudera.org:8080/#/c/8030/1/be/src/util/decompress.cc
File be/src/util/decompress.cc:

Line 426: // If size_only is false, output buffer size must be at least 
output_len. *output_len is
Mention what output_len is set to if a non-OK status is returned.


Line 427: // also updated with actual output size.
the actual output size


Line 575:   // LZ4_decompress_fast will cause a segmentation fault if passed a 
nullptr output.
Comment still relevant?


Line 579:   if (ret < 0) {
single line


-- 
To view, visit http://gerrit.cloudera.org:8080/8030
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifd42942b169921a7eb53940c3762bc45bb82a993
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to