Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8267 )
Change subject: IMPALA-4177,IMPALA-6039: batched bit reading and rle decoding ...................................................................... Patch Set 8: (19 comments) http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/bit-packing.h File be/src/util/bit-packing.h: http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/bit-packing.h@67 PS8, Line 67: static std::pair<const uint8_t*, int64_t> UnpackValues(const uint8_t* __restrict__ in, > Could this one be private? I feel like I created an artificial distinction by making some methods private. I don't see any reason why most of the private methods shouldn't be public - this isn't a stateful object so it's not possible to violate any internal invariants by using the private functions. Maybe I should just make them all public except for NumValuesToUnpack(), which is a helper function? I did that in the current patchset and improved the documentation for the newly public functions. http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/bit-packing.h@81 PS8, Line 81: static std::pair<const uint8_t*, int64_t> UnpackAndDecodeValues( > Could this one be private? See above comment. http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/bit-packing.h@86 PS8, Line 86: /// Unpack exactly 32 values of 'bit_width' from 'in' to 'out'. 'in' must point to > Is there a benefit of having those public over making them private and make See above comment http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/bit-packing.h@114 PS8, Line 114: 'bit_width' > Is this comment still correct? From looking at the implementation I also th The bit about 'bit_width' is incorrect, that version of the function no longer exists. You're right that it's really up to 31 values. Fixed the function name and comment. http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/bit-packing.inline.h File be/src/util/bit-packing.inline.h: http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/bit-packing.inline.h@56 PS8, Line 56: return UnpackValues<OutType, i>(in, in_bytes, num_values, out); > nit: I think adding a blank line between the definition of the macro and ca I restructured it a bit more to separate out the macro definition from the switch. Agree that it wasn't that readable as it - not sure why I structured it like that in the first place. http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/bit-stream-utils.h File be/src/util/bit-stream-utils.h: http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/bit-stream-utils.h@33 PS8, Line 33: /// TODO: replace uses with the more efficient BatchedBitReader. > Would that mean replacing the Writer with the BatchedReader? Oops, no, I meant this comment to apply to the old BitReader when I was partway through this patch, but I ended up removing it entirely and missed the comment. http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/bit-stream-utils.inline.h File be/src/util/bit-stream-utils.inline.h: http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/bit-stream-utils.inline.h@115 PS8, Line 115: auto result = BitPacking::UnpackAndDecodeValues(bit_width, buffer_pos_, > You could use std::tie to unpack the result. Seems to work, thanks for the tip. http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-encoding.h File be/src/util/rle-encoding.h: http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-encoding.h@93 PS8, Line 93: uint32_t NextNumRepeats(); > Could we convert all the uint32_t to int64_t while we're here? There's alre It looks like the counts could actually be int32_t since that's what they're decoded as. Not sure why uint32_t was used here. I could make them int64_t but I think there are downsides to that too - it forces all callers to use wider integer types than necessary. http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-encoding.h@105 PS8, Line 105: /// copying the values to 'values'. > Can you add a comment on the return value, here and below? Done http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-encoding.h@134 PS8, Line 134: decoded > nit: decode Done http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-encoding.h@137 PS8, Line 137: static constexpr int LITERAL_BUFFER_LEN = 32; > out of curiosity, does this need constexpr instead of const? The latter sho No it doesn't - I believe they're equivalent for numeric values. I did miss defining the constant so did that at the bottom of the file. http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-encoding.h@145 PS8, Line 145: exhaustive > nit: exhausted? Done http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-encoding.h@153 PS8, Line 153: bool FillLiteralBuffer() WARN_UNUSED_RESULT; > Do we give any guarantees what the caller can do after this returns false? I extended the class comment to describe how it should be used, including error handling. We don't provide any guarantees about the state after it returns false. http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-encoding.h@159 PS8, Line 159: /// Output buffered literals and decrement 'literal_count_'. > Should we mention that it also increments 'literal_buffer_pos_'? Done http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-encoding.h@494 PS8, Line 494: boundery > nit: typo oops http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-encoding.h@498 PS8, Line 498: int num_read = bit_reader_.UnpackBatch(bit_width_, num_to_bypass, values + num_consumed); > nit: long line Done http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-encoding.h@585 PS8, Line 585: if (UNLIKELY(literal_count_ == 0)) return; > This line doesn't seem to have any effect? Is it there for perf reasons? If This was left in by mistake when the code got moved around. http://gerrit.cloudera.org:8080/#/c/8267/7/testdata/data/README File testdata/data/README: http://gerrit.cloudera.org:8080/#/c/8267/7/testdata/data/README@121 PS7, Line 121: write > nit: writer Done http://gerrit.cloudera.org:8080/#/c/8267/7/testdata/data/bitpacked_def_levels.patch File testdata/data/bitpacked_def_levels.patch: http://gerrit.cloudera.org:8080/#/c/8267/7/testdata/data/bitpacked_def_levels.patch@1 PS7, Line 1: This won't pass RAT check. -- To view, visit http://gerrit.cloudera.org:8080/8267 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I35de0cf80c86f501c4a39270afc8fb8111552ac6 Gerrit-Change-Number: 8267 Gerrit-PatchSet: 8 Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Fri, 20 Oct 2017 17:56:43 +0000 Gerrit-HasComments: Yes
