Todd Lipcon has posted comments on this change.

Change subject: Ported delta encoding from Impala to KUDU.

Patch Set 5:


here are comments on an older rev (I'd queued up a bunch so just stuck with 
reviewing that revision)
File src/kudu/cfile/packed_diff_block.h:

Line 43: // Delta encoding supports signed integer(INT64, INT32, INT16, INT8).
is this an implementation of an algorithm from the literature? one of Lemire's? 
Some tweak thereof? Would be good to reference it

Line 50: //  - list of bitwidths of miniblocks (one byte each)
can a miniblock have a bitwidth of 0 (eg in the case that you have purely 
sequential data?)

Line 54: //  - list of miniblocks: each miniblock is a list of bit packed
this doesn't make clear what the bit packed ints represent - i.e whether it's 
differential (ie each int is a delta from the previous one) or what.

PS5, Line 56: So setting the number of deltas stored in one miniblock
this should refer to kNumEntriesPerMiniBlock below and make it more clear that 
it's set to a fixed number that's part of the format (not a choice).

Line 84:       if (num_elems_ == 0) {
can move this block into the if block on line 90 to get rid of a branch in the 
hot path

Line 86:         cur_mblock_.num_elems = 0;
why not initialize this in Reset()? seems like it would make more sense there 
plus make the tight loop have a bit less code

Line 90:       if (cur_mblock_.num_elems == 0) {
probably PREDICT_FALSE here (only 1/128 of iterations hit it, right?)

Line 108:       if (cur_mblock_.num_elems < kEntriesPerMiniBlock) {

Line 166:     }
layout wise, we should consider advantages and disadvantages of this layout vs 
an "interleaved" layout, where we have all the info for a given miniblock 
contiguous -- maybe some cache miss differences

PS5, Line 184: boost::make_unsigned<

Line 214:     // Caculate the number of bits needed to store the diff between 
max_delta and min_delta.
typo: calculateh

Line 232:   // The number of deltas stored in one miniblock
worth a comment here saying that this is a multiple of 8 so that the miniblocks 
end up aligned

PS5, Line 252: OVERRIDE
can use 'override' now (same elsewhere in this file)

PS5, Line 257: Header corruption"
let's make these more specific to this code path, like 'PackedDiffBlock header 
too short'

PS5, Line 268: Meta-data section corruption

Line 271:     data_bit_offset_ = bit_reader_.position();
is there a guarantee that we are byte-aligned at this point? I think so, 
considering we've only read whole bytes. worth a DCHECK to document the 
invariant here.

PS5, Line 292: (c
nit: dont need the extra parens here

Line 319:       // Iterate the elements in this miniblock until we find the 
element that is equal or greater
can we not std::find (binary search) here? with 128 elements I think binary 
search is worth it. Similarly maybe binary search is worth it above to find the 
right miniblock?

Line 333:     if (cur_idx_ == num_elems_)
add {}, maybe PREDICT_FALSE

Line 363:     return (num_elems_ - cur_idx_) > 0;
why not the more straight forward 'cur_idx < num_elems_'? (also less apt to 
have an int underflow)

PS5, Line 373: k 
can we find a better name for this var?

Line 381:       if (pending_.size() == 0) {

Line 387:     if (pending_.size() > 0) {

Line 400:       
this cast seems a no-op
File src/kudu/util/bit-stream-utils.inline.h:

Line 120:   int bytes_remaining = max_bytes_ - byte_offset_;
can this just be a call to BufferValues()?

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I1446a78f22773c28a7cc877fbe861697e39b2af8
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Reviewer: David Ribeiro Alves <>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <>
Gerrit-HasComments: Yes

Reply via email to