[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

Jim Apple (Code Review) Fri, 07 Oct 2016 11:13:35 -0700

Jim Apple has posted comments on this change.

Change subject: IMPALA-4123: Fast bit unpacking
......................................................................



Patch Set 7:

(20 comments)

Thanks for your patience

http://gerrit.cloudera.org:8080/#/c/4494/7/be/src/benchmarks/bit-packing-benchmark.cc
File be/src/benchmarks/bit-packing-benchmark.cc:

PS7, Line 336: int64_t
const, or just make this the param to the data constructor and use data.size() 
as the last argument to the params initializer list.


PS7, Line 343:     suite.AddBenchmark(Substitute("UnpackScalar", bit_width), 
UnpackBenchmark, &params);
This line at bit_width 32 is not in the output you display in the comment at 
the top.


http://gerrit.cloudera.org:8080/#/c/4494/7/be/src/util/bit-packing-test.cc
File be/src/util/bit-packing-test.cc:

PS7, Line 35:  a subset of
            : /// 'num_in_values'
I still find this wording confusing - Which is the to buffer, and which the 
from? What does it mean to have a subset of an int?


PS7, Line 42: misaligned
When misalignment is 0, what alignment is the memory guaranteed to have? 0 mod 
64?


PS7, Line 49: uint32_t
const


Line 93:     uint32_t* in, uint8_t* packed, int num_in_values, int bit_width, 
int misalignment) {
const T*


PS7, Line 94: int
const


PS7, Line 99: vector
std::aligned_storage?


PS7, Line 102:  sizeof(uint32_t)
Why is this needed?


PS7, Line 132: min(length, 1)
so the only misalignments allowed are 0 and 1? Why so restrictive?


http://gerrit.cloudera.org:8080/#/c/4494/2/be/src/util/bit-packing.h
File be/src/util/bit-packing.h:

PS2, Line 57: ast one return
> 8 is the smallest number such that (3 * 8) % 8 == 0. I.e. where the last bi
OK, but you said "you have to choose" 8. A batch size of 32 would still work, 
right?


http://gerrit.cloudera.org:8080/#/c/4494/7/be/src/util/bit-packing.h
File be/src/util/bit-packing.h:

PS7, Line 30: values
'value'


PS7, Line 67:  bits of data
You can elide this phrase


http://gerrit.cloudera.org:8080/#/c/4494/4/be/src/util/bit-packing.inline.h
File be/src/util/bit-packing.inline.h:

PS4, Line 125:     const uint32_t trailing_bits = in_words[IN_WORD_IDX + 1] % 
> That's true, but there's a downside - if the loads are aligned relative to 
I'm not so sure. Let's consider reading 10 bit ints using 32-bit loads. To 
unpack 16 of them with your method, you need to read 160 bits, which means 5 
(aligned) loads. 12 of the unpacked values can be read without using this third 
(more expensive) branch. In this branch, there are two more ALU ops, so that's 
4*2 = 8 extra ALU ops.

OTOH, using un-aligned loads, you could do only 4 loads and (at bits 0, 24, 48, 
and 80) and no more than two ALU ops for each extraction.


http://gerrit.cloudera.org:8080/#/c/4494/7/be/src/util/bit-packing.inline.h
File be/src/util/bit-packing.inline.h:

PS7, Line 57: const
constexpr BATCH_SIZE


PS7, Line 69:     in_bytes -= batch_size * (BIT_WIDTH / CHAR_BIT);
so in_bytes does not move if BIT_WIDTH is 7?


PS7, Line 91: should
Should this be "must", since  VALUE_IDX is now a template param? Not that the 
explanation is wrong, but now you have forced the caller into efficient 
behavior with your choice.


PS7, Line 178: const
constexpr MAX_BATCH_SIZE


PS7, Line 179: int
const


PS7, Line 183:   // Copy into padded temporary buffer to avoid reading past the 
end of 'in'.
Can this be avoided sometimes? For instance, if BIT_WIDTH is 6 and num_values 
is 16, can we just use 'in'?


-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Jim Apple <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

Reply via email to