Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4123: Fast bit unpacking
......................................................................


Patch Set 2:

(33 comments)

http://gerrit.cloudera.org:8080/#/c/4494/2/be/src/benchmarks/bit-packing-benchmark.cc
File be/src/benchmarks/bit-packing-benchmark.cc:

Line 296:         reader.Reset(p->data, p->data_len);
> Should offset advance in this loop, then?
The code was a bit obfuscated. The loop only really retries once, so there's no 
need to advance offset. Changed it to be more obvious. Offset is advanced once 
per value read by virtue of the offset calculation above.


http://gerrit.cloudera.org:8080/#/c/4494/3/be/src/benchmarks/bit-packing-benchmark.cc
File be/src/benchmarks/bit-packing-benchmark.cc:

Line 270: #include "common/names.h"
> Why the blank line here?
We separate out common/names.h like this most places because it needs to come 
after all the other includes.


PS3, Line 277: 
> Maybe NUM_OUT_VALUES
Done


PS3, Line 305: * p
> const uint8_t * const data_end ...
Done


Line 334:     for (int i = 0; i < data_len; ++i) {
> std::iota
Done


PS3, Line 336: 
> I think you can even elide the parens
Done


http://gerrit.cloudera.org:8080/#/c/4494/3/be/src/util/bit-packing-test.cc
File be/src/util/bit-packing-test.cc:

PS3, Line 29: compute_mask
> ComputeMask, and in an anonymous namespace
Done


PS3, Line 33: UnpackSubset
> With a comment here, if I'm going to use it in PackUnpack.
Done


PS3, Line 37: with memory that is
> this phrase can be omitted
Done


PS3, Line 39: int
> unsigned? DCHECK <= something?
I don't think there's a particular limit on the number of values we could use 
here.


PS3, Line 39: num_values
> num_in_values?
Done


PS3, Line 39: int
> also unsigned?
I think int should be ok - I normally prefer using signed types by default even 
for variables that logically should always be positive because it's easier to 
reason about the number going negative compared with underflowing (also easier 
to catch the out-of-range value with dchecks).


PS3, Line 45: int
> const, here and elsewhere
Done


PS3, Line 51: & compute_mask(bit_width)
> AKA mask; already hoisted
Done


PS3, Line 58: becase
> because
Done


PS3, Line 58: the input buffer size
> "the input buffer size (num_values)"
Done


PS3, Line 60:     // Size buffer exactly so that ASAN can detect buffer 
overruns.
> I think manual canaray checks might be called for here
We rely on ASAN to detect read overruns anyway (which I was more concerned 
about since they're tricky to avoid in the code), so it doesn't seem worth it 
to me on balance to add the extra test code.


PS3, Line 65: ASSERT_EQ
> Can you add more " << error message " to your ASSERTs?
Done


http://gerrit.cloudera.org:8080/#/c/4494/2/be/src/util/bit-packing.h
File be/src/util/bit-packing.h:

Line 36:   static const std::pair<const uint8_t*, int64_t> UnpackValues(int 
bit_width,
> Yeah, the verbosity is frustrating. However, cstdint doesn't necessarily pt
It seems like most implementations of cstdint put the typedefs in the global 
namespace anyway, so I think we're ok for now. If that turns out to be a 
problem I think we should create a wrapper header with the using declarations.


PS2, Line 57: Unpack32Values
> So, this boundary condition happens for any multiple of 8, and you chose 32
A batch size of 32 is the smallest that works for all supported bit widths. 
Otherwise you have to choose different batch sizes for different bit widths 
(e.g 32 for bit width 31, 8 for bit width 3, etc), which gets more complicated.

Lemire's bitpacking libraries use 32 everywhere (https://github.com/lemire/) so 
it seemed like a reasonable choice - he obviously put some effort into 
validating the performance.

I could benchmark different numbers but I think at this point I'd be optimising 
for the microbenchmark rather than a real workload.


http://gerrit.cloudera.org:8080/#/c/4494/3/be/src/util/bit-packing.h
File be/src/util/bit-packing.h:

PS3, Line 55: IT_WIDTH from 'in' to 'out'.
> But the last byte of 'in' that was read might only have been partially used
That's correct. I deliberately made the decision that these functions shouldn't 
deal with resuming reads in the middle of bytes, because it makes things way 
more complicated (and probably slower). I expect that callers will either read 
in batches or do their own buffering if they need to read a value at a time.

Expanded on the comment to make this explicit and explain what the caller can 
do.


http://gerrit.cloudera.org:8080/#/c/4494/3/be/src/util/bit-packing.inline.h
File be/src/util/bit-packing.inline.h:

PS3, Line 56: >(in,
> can be omitted if the branches are swapped
Done


Line 56:     case 21: return UnpackValues<OutType, 21>(in, in_bytes, 
num_values, out);
> long line
Done


PS3, Line 90: 
            :   if (r
> I think you accidentally a word
I ended up retrying the template function thing and it works ok this time 
around. I think something else was confounding it before (perhaps using #pragma 
unroll instead of manually unrolling the loop).


PS3, Line 93: t
> If 'i' is a compile-time constant, maybe call it "I" or "NTH"?
It's now a template arg, so renamed too.


PS3, Line 93: _values, 
> I think "LOAD_TYPE" could be "LoadType" to more clearly match the lexical s
Done


PS3, Line 96: 
> I think constexprs should get the static const treatment of all caps
Done


PS3, Line 106: 
> Can you add a comment describing what this is for?
renamed to lower_bits, since it seemed like a better name after writing the 
comment.


PS3, Line 108: me
> How is this 32 derived?
Mainly because 32-bit values are the largest we support. This changed a bit 
with the switch to a template. Now we just return LoadType, then the caller can 
truncate it.


PS3, Line 109: as possible an
> so, this case al the bits we need are in one word?
Yes, good point, added that to the comment.


PS3, Line 119: expr int first_bit_offset = first_bit % load_bit_wid
> Maybe give it an unsigned type?
I changed the types to unsigned throughout this function since that seems more 
appropriate. However, here the calculation just underflows and produces a 
different spurious warning, so I still need the ternary operator.


PS3, Line 124:  LOAD_TYPE shifted = BIT_WIDTH > 0 ?                             
      
> this could use a bit more explanation
It's now the same problem as the above one, so the comment is the same now.


PS3, Line 143:      c
> Check not already defined?
I'm not quite sure that I understand the intent or how I'd do that, aside from 
something verbose like:

#ifdef UNPACK_VALUE_CALL
#error "Shouldn't be defined"
#endif


-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Jim Apple <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to