Jim Apple has posted comments on this change. Change subject: IMPALA-4123: Fast bit unpacking ......................................................................
Patch Set 7: (20 comments) Thanks for your patience http://gerrit.cloudera.org:8080/#/c/4494/7/be/src/benchmarks/bit-packing-benchmark.cc File be/src/benchmarks/bit-packing-benchmark.cc: PS7, Line 336: int64_t const, or just make this the param to the data constructor and use data.size() as the last argument to the params initializer list. PS7, Line 343: suite.AddBenchmark(Substitute("UnpackScalar", bit_width), UnpackBenchmark, ¶ms); This line at bit_width 32 is not in the output you display in the comment at the top. http://gerrit.cloudera.org:8080/#/c/4494/7/be/src/util/bit-packing-test.cc File be/src/util/bit-packing-test.cc: PS7, Line 35: a subset of : /// 'num_in_values' I still find this wording confusing - Which is the to buffer, and which the from? What does it mean to have a subset of an int? PS7, Line 42: misaligned When misalignment is 0, what alignment is the memory guaranteed to have? 0 mod 64? PS7, Line 49: uint32_t const Line 93: uint32_t* in, uint8_t* packed, int num_in_values, int bit_width, int misalignment) { const T* PS7, Line 94: int const PS7, Line 99: vector std::aligned_storage? PS7, Line 102: sizeof(uint32_t) Why is this needed? PS7, Line 132: min(length, 1) so the only misalignments allowed are 0 and 1? Why so restrictive? http://gerrit.cloudera.org:8080/#/c/4494/2/be/src/util/bit-packing.h File be/src/util/bit-packing.h: PS2, Line 57: ast one return > 8 is the smallest number such that (3 * 8) % 8 == 0. I.e. where the last bi OK, but you said "you have to choose" 8. A batch size of 32 would still work, right? http://gerrit.cloudera.org:8080/#/c/4494/7/be/src/util/bit-packing.h File be/src/util/bit-packing.h: PS7, Line 30: values 'value' PS7, Line 67: bits of data You can elide this phrase http://gerrit.cloudera.org:8080/#/c/4494/4/be/src/util/bit-packing.inline.h File be/src/util/bit-packing.inline.h: PS4, Line 125: const uint32_t trailing_bits = in_words[IN_WORD_IDX + 1] % > That's true, but there's a downside - if the loads are aligned relative to I'm not so sure. Let's consider reading 10 bit ints using 32-bit loads. To unpack 16 of them with your method, you need to read 160 bits, which means 5 (aligned) loads. 12 of the unpacked values can be read without using this third (more expensive) branch. In this branch, there are two more ALU ops, so that's 4*2 = 8 extra ALU ops. OTOH, using un-aligned loads, you could do only 4 loads and (at bits 0, 24, 48, and 80) and no more than two ALU ops for each extraction. http://gerrit.cloudera.org:8080/#/c/4494/7/be/src/util/bit-packing.inline.h File be/src/util/bit-packing.inline.h: PS7, Line 57: const constexpr BATCH_SIZE PS7, Line 69: in_bytes -= batch_size * (BIT_WIDTH / CHAR_BIT); so in_bytes does not move if BIT_WIDTH is 7? PS7, Line 91: should Should this be "must", since VALUE_IDX is now a template param? Not that the explanation is wrong, but now you have forced the caller into efficient behavior with your choice. PS7, Line 178: const constexpr MAX_BATCH_SIZE PS7, Line 179: int const PS7, Line 183: // Copy into padded temporary buffer to avoid reading past the end of 'in'. Can this be avoided sometimes? For instance, if BIT_WIDTH is 6 and num_values is 16, can we just use 'in'? -- To view, visit http://gerrit.cloudera.org:8080/4494 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Jim Apple <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
