Attila Jeges has posted comments on this change. Change subject: IMPALA-3079: Fix sequence file writer ......................................................................
Patch Set 2: (9 comments) http://gerrit.cloudera.org:8080/#/c/6107/1/be/src/exec/read-write-util-test.cc File be/src/exec/read-write-util-test.cc: PS1, Line 88: > nit: blank space. Done http://gerrit.cloudera.org:8080/#/c/6107/1/be/src/exec/read-write-util.h File be/src/exec/read-write-util.h: PS1, Line 217: ~val; > Did you mean to convert this to the absolute value ? If so, you are missing It should be ~val. PS1, Line 219: : : inline int64_t ReadWriteUtil::PutVLong(int64_t val, uint8_t* buf) { : int64_t num_bytes = VLongRequiredBytes(val); : : if (num_bytes == 1) { : // store the value itself instead of the > Typo. I meant bit scan from the MSB. __builtin_clzl() Done PS1, Line 240: buf[i] = (val >> (8 * (num_bytes - i - 1))) & 0xFF; : } > Is this behavior documented somewhere ? Is it part of the sequence file sta Yes, it is part of the sequence file standard and it can be read back by Hive. I've added an E2E test to confirm this. Sequence file documentation: http://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/SequenceFile.html The Zero-Compressed Integer format is described here: https://hadoop.apache.org/docs/r2.7.2/api/org/apache/hadoop/io/WritableUtils.html#writeVLong(java.io.DataOutput, long) The details are in the actual source code (WriteVLong() method): https://hadoop.apache.org/docs/r2.7.2/api/src-html/org/apache/hadoop/io/WritableUtils.html PS1, Line 246: Util::PutVInt(int32 > Big Endianness ? Correct. Changed the comment. http://gerrit.cloudera.org:8080/#/c/6107/1/be/src/util/compress.cc File be/src/util/compress.cc: PS1, Line 209: each preceded with an integer for > , each preceded with an integer for its size. Done PS1, Line 210: // For testing purposes we are going to generate two blocks if input_length >= 4K > That seems a bit inefficient to split into two blocks for input_length == 2 Done,changed the threshold to 4K PS1, Line 235: ReadWriteUtil:: > The declaration can be moved into the loop body. Done http://gerrit.cloudera.org:8080/#/c/6107/1/testdata/workloads/functional-query/queries/QueryTest/seq-writer.test File testdata/workloads/functional-query/queries/QueryTest/seq-writer.test: Line 93: SET COMPRESSION_CODEC=NONE; > what about the other codecs? This test case runs for 8.5 seconds. I've added test cases for other codecs with RECORD/BLOCK compression mode. -- To view, visit http://gerrit.cloudera.org:8080/6107 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0db642ad35132a9a5a6611810a6cafbbe26e7487 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges <[email protected]> Gerrit-Reviewer: Attila Jeges <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-HasComments: Yes
