Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8030 )
Change subject: IMPALA-5250: Unify decompressor output_length semantics ...................................................................... IMPALA-5250: Unify decompressor output_length semantics This patch makes the semantics of the output_length parameter in Codec::ProcessBlock to be the same across all codecs. In existing code different decompressor treats output_length differently: 1. SnappyDecompressor needs output_length to be greater than or equal to the actual decompressed length, but it does not set it to the actual decompressed length after decompression. 2. SnappyBlockDecompressor and Lz4Decompressor require output_length to be exactly the same as the actual decompressed length, otherwise decompression fails. 3. Other decompressors need output_length to be greater than or equal to the actual decompressed length and will set it to actual decompressed length if oversized. This inconsistency leads to a bug where the error message is undeterministic when the compressed block is corrupted. This patch makes all decompressor behave like a modified version of 3: Output_length should be greater than or equal to the actual decompressed length and it will be set to actual decompressed length if oversized. A decompression failure sets it to 0. Lz4Decompressor will use the "safe" instead of the "fast" decompression function, for the latter is insecure with corrupted data and requires the decompressed length to be known. Testing: A testcase is added checking that the decompressors can handle an oversized output buffer correctly. A regression test for the exact case described in IMPALA-5250 is also added. A benchmark is run on a 16-node cluster testing the performance impact of the LZ4Decompressor change and no performance regression is found. Change-Id: Ifd42942b169921a7eb53940c3762bc45bb82a993 Reviewed-on: http://gerrit.cloudera.org:8080/8030 Reviewed-by: Alex Behm <[email protected]> Tested-by: Impala Public Jenkins --- M be/src/util/codec.h M be/src/util/decompress-test.cc M be/src/util/decompress.cc 3 files changed, 74 insertions(+), 45 deletions(-) Approvals: Alex Behm: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/8030 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ifd42942b169921a7eb53940c3762bc45bb82a993 Gerrit-Change-Number: 8030 Gerrit-PatchSet: 6 Gerrit-Owner: Tianyi Wang <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tianyi Wang <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]>
