Tim Armstrong has posted comments on this change. Change subject: IMPALA-4123: Fast bit unpacking ......................................................................
Patch Set 2: (34 comments) http://gerrit.cloudera.org:8080/#/c/4494/2//COMMIT_MSG Commit Message: PS2, Line 12: was > "want" Done PS2, Line 18: possible > long line Done http://gerrit.cloudera.org:8080/#/c/4494/2/be/src/benchmarks/bit-packing-benchmark.cc File be/src/benchmarks/bit-packing-benchmark.cc: PS2, Line 259: include > I think cmath, cstdlib, cstdio are the non-deprecated versions for C++ Done Line 260: #include <stdlib.h> > Can you arrange these in three blocks: Done Line 276: #define NUM_VALUES 1024 * 1024 > constexpr int Done PS2, Line 281: BenchmarkParams > You can leave the constructor out and use aggregate initialization: Done. I didn't realise you could use this with new Line 290: BenchmarkParams* p = reinterpret_cast<BenchmarkParams*>(data); > maybe const auto p = reinterpret_cast<const BenchmarkParams *> I added the const. It doesn't seem verbose enough to require auto. PS2, Line 294: int64_t > const Done PS2, Line 294: i * 32 % NUM_VALUES + j; > Shouldn't this while thing be modulo NUM_VALUES, not just the non-+j part? Done Line 296: reader.Reset(p->data, p->data_len); > Why Reset? We need to do this to start the reader reading from the start of the buffer again. PS2, Line 333: uint8_t* data = new uint8_t[data_len]; > unique_ptr to clean up after Changed to a vector PS2, Line 334: int > int64_t Done PS2, Line 337: BenchmarkParams > This can be a local var, right? Done http://gerrit.cloudera.org:8080/#/c/4494/2/be/src/util/bit-packing.h File be/src/util/bit-packing.h: Line 18: #include <stdint.h> > cstdint Done PS2, Line 29: Unpack bit-packed values > What does this phrase mean? I added an explanation to the class comment, probably needed to make the rest of the code understandable. PS2, Line 30: outputted > unpacked Done PS2, Line 30: is > are Done PS2, Line 30: 'out' must have : /// enough space for 'num_values'. > And in must point to at least in_bytes of addressable memory? Done PS2, Line 31: 0 > What does a zero bit_width mean in this context? Added an explanation to the class comment. It means every value is zero. PS2, Line 33: plus > "And also" Done PS2, Line 32: utType. : /// Retu > extra line break here, please Done PS2, Line 33: the byte > a pointer to the byte Done PS2, Line 33: the number of : /// values that were read. > I can infer that by just subtracting, right? I think the pair might be over It's the number of logical values that were read. The calculation is a function of the input arguments: min(num_values, in_bytes / bit_width) but it seems like the kind of thing that is easy to get wrong - that was the reason for making in_bytes a parameter rather than letting the caller do the calculation. Also division by bit_width is slower here than inside the function where we're dispatching to bit_width-specialized code anyway. PS2, Line 36: const > This const has no effect, I believe. Done Line 36: static const std::pair<const uint8_t*, int64_t> UnpackValues(int bit_width, > std::uint8_t, etc. We don't do this anywhere else, seems unnecessarily verbose. PS2, Line 57: Unpack32Values > Why not up to 64? Added some explanation to the class comment for the magic number. It works out because for 32-bit packed values, the end of the packed batch always falls on a byte boundary, so there's no need to deal with partial bytes between batches. If we went to larger batches I think the additional benefit from amortising loop overheads is fairly minimal and the code will get a lot bigger. http://gerrit.cloudera.org:8080/#/c/4494/2/be/src/util/bit-packing.inline.h File be/src/util/bit-packing.inline.h: Line 35: case 0: return UnpackValues<OutType, 0>(in, in_bytes, num_values, out); > BOOST_PP_REPEAT_FROM_TO would be worth it here, I think Done here and elsewhere. I'm a little concerned that it makes it hard to debug any compile errors, but hopefully the code shouldn't change much. PS2, Line 70: <const uint8_t*, int64_t> > can be omitted In this case it can't be inferred from the arguments. PS2, Line 70: NULL > nullptr Done. Have we actually standardised on this? PS2, Line 78: DCHECK_GE > static_assert Done here and elsewhere. This seems to have made the debug build a lot faster, which is great. PS2, Line 79: 8 > CHAR_BIT, here and below Done PS2, Line 80: in_bytes * 8 / BIT_WIDTH > I think this line would be clearer with more parens Done PS2, Line 99: unsigned integer type > static_check with http://en.cppreference.com/w/cpp/types/is_unsigned? Done. PS2, Line 108: inlined as : // soon as possible and subject to all optimizations > Can you elaborate on this? Are there benchmarks on that? Reworded this. I did some experiments early on looking at the assembly and wasn't able to convince gcc to generate the optimal code without resorting to the macro approach. -- To view, visit http://gerrit.cloudera.org:8080/4494 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Jim Apple <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
