Alex Behm has posted comments on this change. Change subject: IMPALA-5250: Unify decompressor output_length semantics ......................................................................
Patch Set 1: (18 comments) http://gerrit.cloudera.org:8080/#/c/8030/1//COMMIT_MSG Commit Message: Line 9: This patch makes the semantics output_length parameter in the semantics of the output_length Line 10: Codec::ProcessBlock to be the same. In existing code different Codec::ProcessBlock() to be the same across all codecs. In the existing code different decompressors treat output_length differently. Line 13: to actual decompressed length, but it does not set it to actual to the actual decompressed length Line 16: be exactly the same as actual decompressed length, otherwise as the actual decompressed length Line 19: actual decompressed length and will set it to actual decompressed the actual decompressed length Line 21: This inconsistency leads to a bug where the error message is This inconsistency lead to a bug where .... Line 24: decompressors can handle oversized output buffer correctly. can handle an oversized output buffer correctly. Line 25: Lz4Decompressor will use the "safe" version of decompression function will use the "safe" instead of the "fast" decompression function Line 26: instead of the "fast" version, for the latter is insecure with corrupted We use LZ4 for compressing exchange data, so we check the perf impact of these changes. I can help you with the details. Line 27: data and requires decompressed length to be known. and requires the decompressed length to be known http://gerrit.cloudera.org:8080/#/c/8030/1/be/src/util/decompress-test.cc File be/src/util/decompress-test.cc: Line 153: // Test the behavior when the decompressor is given too little / too much space "." at end of sentence Line 155: // correct output size when the space is enough, and does not write beyond the output the correct output size Line 157: void DecompressOverUnderSizedOutputBuffer(Codec *compressor, Codec *decompressor, Out style is "Codec* compressor" Line 196: } I think we should test more directly the scenario mentioned in the JIRA where a corrupt file can lead to the output_len not being set to 0 properly. http://gerrit.cloudera.org:8080/#/c/8030/1/be/src/util/decompress.cc File be/src/util/decompress.cc: Line 426: // If size_only is false, output buffer size must be at least output_len. *output_len is Mention what output_len is set to if a non-OK status is returned. Line 427: // also updated with actual output size. the actual output size Line 575: // LZ4_decompress_fast will cause a segmentation fault if passed a nullptr output. Comment still relevant? Line 579: if (ret < 0) { single line -- To view, visit http://gerrit.cloudera.org:8080/8030 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ifd42942b169921a7eb53940c3762bc45bb82a993 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tianyi Wang <tw...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-HasComments: Yes