Tim Armstrong has posted comments on this change. Change subject: IMPALA-4123: Fast bit unpacking ......................................................................
Patch Set 2: (33 comments) http://gerrit.cloudera.org:8080/#/c/4494/2/be/src/benchmarks/bit-packing-benchmark.cc File be/src/benchmarks/bit-packing-benchmark.cc: Line 296: reader.Reset(p->data, p->data_len); > Should offset advance in this loop, then? The code was a bit obfuscated. The loop only really retries once, so there's no need to advance offset. Changed it to be more obvious. Offset is advanced once per value read by virtue of the offset calculation above. http://gerrit.cloudera.org:8080/#/c/4494/3/be/src/benchmarks/bit-packing-benchmark.cc File be/src/benchmarks/bit-packing-benchmark.cc: Line 270: #include "common/names.h" > Why the blank line here? We separate out common/names.h like this most places because it needs to come after all the other includes. PS3, Line 277: > Maybe NUM_OUT_VALUES Done PS3, Line 305: * p > const uint8_t * const data_end ... Done Line 334: for (int i = 0; i < data_len; ++i) { > std::iota Done PS3, Line 336: > I think you can even elide the parens Done http://gerrit.cloudera.org:8080/#/c/4494/3/be/src/util/bit-packing-test.cc File be/src/util/bit-packing-test.cc: PS3, Line 29: compute_mask > ComputeMask, and in an anonymous namespace Done PS3, Line 33: UnpackSubset > With a comment here, if I'm going to use it in PackUnpack. Done PS3, Line 37: with memory that is > this phrase can be omitted Done PS3, Line 39: int > unsigned? DCHECK <= something? I don't think there's a particular limit on the number of values we could use here. PS3, Line 39: num_values > num_in_values? Done PS3, Line 39: int > also unsigned? I think int should be ok - I normally prefer using signed types by default even for variables that logically should always be positive because it's easier to reason about the number going negative compared with underflowing (also easier to catch the out-of-range value with dchecks). PS3, Line 45: int > const, here and elsewhere Done PS3, Line 51: & compute_mask(bit_width) > AKA mask; already hoisted Done PS3, Line 58: becase > because Done PS3, Line 58: the input buffer size > "the input buffer size (num_values)" Done PS3, Line 60: // Size buffer exactly so that ASAN can detect buffer overruns. > I think manual canaray checks might be called for here We rely on ASAN to detect read overruns anyway (which I was more concerned about since they're tricky to avoid in the code), so it doesn't seem worth it to me on balance to add the extra test code. PS3, Line 65: ASSERT_EQ > Can you add more " << error message " to your ASSERTs? Done http://gerrit.cloudera.org:8080/#/c/4494/2/be/src/util/bit-packing.h File be/src/util/bit-packing.h: Line 36: static const std::pair<const uint8_t*, int64_t> UnpackValues(int bit_width, > Yeah, the verbosity is frustrating. However, cstdint doesn't necessarily pt It seems like most implementations of cstdint put the typedefs in the global namespace anyway, so I think we're ok for now. If that turns out to be a problem I think we should create a wrapper header with the using declarations. PS2, Line 57: Unpack32Values > So, this boundary condition happens for any multiple of 8, and you chose 32 A batch size of 32 is the smallest that works for all supported bit widths. Otherwise you have to choose different batch sizes for different bit widths (e.g 32 for bit width 31, 8 for bit width 3, etc), which gets more complicated. Lemire's bitpacking libraries use 32 everywhere (https://github.com/lemire/) so it seemed like a reasonable choice - he obviously put some effort into validating the performance. I could benchmark different numbers but I think at this point I'd be optimising for the microbenchmark rather than a real workload. http://gerrit.cloudera.org:8080/#/c/4494/3/be/src/util/bit-packing.h File be/src/util/bit-packing.h: PS3, Line 55: IT_WIDTH from 'in' to 'out'. > But the last byte of 'in' that was read might only have been partially used That's correct. I deliberately made the decision that these functions shouldn't deal with resuming reads in the middle of bytes, because it makes things way more complicated (and probably slower). I expect that callers will either read in batches or do their own buffering if they need to read a value at a time. Expanded on the comment to make this explicit and explain what the caller can do. http://gerrit.cloudera.org:8080/#/c/4494/3/be/src/util/bit-packing.inline.h File be/src/util/bit-packing.inline.h: PS3, Line 56: >(in, > can be omitted if the branches are swapped Done Line 56: case 21: return UnpackValues<OutType, 21>(in, in_bytes, num_values, out); > long line Done PS3, Line 90: : if (r > I think you accidentally a word I ended up retrying the template function thing and it works ok this time around. I think something else was confounding it before (perhaps using #pragma unroll instead of manually unrolling the loop). PS3, Line 93: t > If 'i' is a compile-time constant, maybe call it "I" or "NTH"? It's now a template arg, so renamed too. PS3, Line 93: _values, > I think "LOAD_TYPE" could be "LoadType" to more clearly match the lexical s Done PS3, Line 96: > I think constexprs should get the static const treatment of all caps Done PS3, Line 106: > Can you add a comment describing what this is for? renamed to lower_bits, since it seemed like a better name after writing the comment. PS3, Line 108: me > How is this 32 derived? Mainly because 32-bit values are the largest we support. This changed a bit with the switch to a template. Now we just return LoadType, then the caller can truncate it. PS3, Line 109: as possible an > so, this case al the bits we need are in one word? Yes, good point, added that to the comment. PS3, Line 119: expr int first_bit_offset = first_bit % load_bit_wid > Maybe give it an unsigned type? I changed the types to unsigned throughout this function since that seems more appropriate. However, here the calculation just underflows and produces a different spurious warning, so I still need the ternary operator. PS3, Line 124: LOAD_TYPE shifted = BIT_WIDTH > 0 ? > this could use a bit more explanation It's now the same problem as the above one, so the comment is the same now. PS3, Line 143: c > Check not already defined? I'm not quite sure that I understand the intent or how I'd do that, aside from something verbose like: #ifdef UNPACK_VALUE_CALL #error "Shouldn't be defined" #endif -- To view, visit http://gerrit.cloudera.org:8080/4494 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Jim Apple <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
