Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8267 )

Change subject: IMPALA-4177,IMPALA-6039: batched bit reading and rle decoding
......................................................................


Patch Set 8:

(18 comments)

Thank you for reworking this code. I did a pass over the low level files and 
left mostly questions and minor suggestions. I still need to look at the 
integration into the column readers and the tests and benchmarks.

http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/bit-packing.h
File be/src/util/bit-packing.h:

http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/bit-packing.h@67
PS8, Line 67:   static std::pair<const uint8_t*, int64_t> UnpackValues(const 
uint8_t* __restrict__ in,
Could this one be private?


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/bit-packing.h@81
PS8, Line 81:   static std::pair<const uint8_t*, int64_t> UnpackAndDecodeValues(
Could this one be private?


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/bit-packing.h@86
PS8, Line 86:   /// Unpack exactly 32 values of 'bit_width' from 'in' to 'out'. 
'in' must point to
Is there a benefit of having those public over making them private and make the 
tests that use them a friend?


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/bit-packing.h@114
PS8, Line 114: 'bit_width'
Is this comment still correct? From looking at the implementation I also 
thought that this returns at most 31 values and that the caller has to make 
sure to set num_values accordingly.


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/bit-packing.inline.h
File be/src/util/bit-packing.inline.h:

http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/bit-packing.inline.h@56
PS8, Line 56:     return UnpackValues<OutType, i>(in, in_bytes, num_values, 
out);
nit: I think adding a blank line between the definition of the macro and 
calling it may increase readability. Feel free to ignore though.


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/bit-stream-utils.h
File be/src/util/bit-stream-utils.h:

http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/bit-stream-utils.h@33
PS8, Line 33: /// TODO: replace uses with the more efficient BatchedBitReader.
Would that mean replacing the Writer with the BatchedReader?


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/bit-stream-utils.inline.h
File be/src/util/bit-stream-utils.inline.h:

http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/bit-stream-utils.inline.h@115
PS8, Line 115:   auto result = BitPacking::UnpackAndDecodeValues(bit_width, 
buffer_pos_,
You could use std::tie to unpack the result.


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-encoding.h
File be/src/util/rle-encoding.h:

http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-encoding.h@93
PS8, Line 93:   uint32_t NextNumRepeats();
Could we convert all the uint32_t to int64_t while we're here? There's already 
a JIRA to do this and I think we generally try to avoid unsigned types unless 
there's a specific reason.


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-encoding.h@105
PS8, Line 105:   /// copying the values to 'values'.
Can you add a comment on the return value, here and below?


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-encoding.h@134
PS8, Line 134: decoded
nit: decode


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-encoding.h@137
PS8, Line 137:   static constexpr int LITERAL_BUFFER_LEN = 32;
out of curiosity, does this need constexpr instead of const? The latter should 
be sufficient to specify the array size below.


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-encoding.h@145
PS8, Line 145: exhaustive
nit: exhausted?


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-encoding.h@153
PS8, Line 153:   bool FillLiteralBuffer() WARN_UNUSED_RESULT;
Do we give any guarantees what the caller can do after this returns false? 
Should we mention that they have to call Reset() to get the object back into a 
usable state?


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-encoding.h@159
PS8, Line 159:   /// Output buffered literals and decrement 'literal_count_'.
Should we mention that it also increments 'literal_buffer_pos_'?

What is the return value?


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-encoding.h@494
PS8, Line 494: boundery
nit: typo


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-encoding.h@498
PS8, Line 498:     int num_read = bit_reader_.UnpackBatch(bit_width_, 
num_to_bypass, values + num_consumed);
nit: long line


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-encoding.h@585
PS8, Line 585:     if (UNLIKELY(literal_count_ == 0)) return;
This line doesn't seem to have any effect? Is it there for perf reasons? If so, 
can you add a brief comment?


http://gerrit.cloudera.org:8080/#/c/8267/7/testdata/data/README
File testdata/data/README:

http://gerrit.cloudera.org:8080/#/c/8267/7/testdata/data/README@121
PS7, Line 121: write
nit: writer



--
To view, visit http://gerrit.cloudera.org:8080/8267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I35de0cf80c86f501c4a39270afc8fb8111552ac6
Gerrit-Change-Number: 8267
Gerrit-PatchSet: 8
Gerrit-Owner: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Thu, 19 Oct 2017 23:04:15 +0000
Gerrit-HasComments: Yes

Reply via email to