Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8267 )

Change subject: IMPALA-4177,IMPALA-6039: batched bit reading and rle decoding
......................................................................


Patch Set 10:

(21 comments)

Mostly minor stuff that I came across when reading through the change.

http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/exec/parquet-column-readers.h
File be/src/exec/parquet-column-readers.h:

http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/exec/parquet-column-readers.h@89
PS8, Line 89: in
nit: double word


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/exec/parquet-column-readers.h@91
PS8, Line 91: a
nit: an


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/exec/parquet-column-readers.h@98
PS8, Line 98:   bool FillCacheBitPacked(int batch_size, int* num_cached_levels);
This seems to be left over from refactoring but doesn't have an implementation. 
Remove?


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/exec/parquet-column-readers.h@106
PS8, Line 106: in
nit: extra word


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/exec/parquet-column-readers.h@116
PS8, Line 116:   parquet::Encoding::type encoding_ = parquet::Encoding::PLAIN;
Can you add a comment to this one, too?


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/exec/parquet-column-readers.h@128
PS8, Line 128:   string filename_;
Can you add a comment for completeness here and include that it's only used for 
error reporting?


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/exec/parquet-column-readers.cc@154
PS8, Line 154: there enough
nit: missing word


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/exec/parquet-column-readers.cc@354
PS8, Line 354: since we are only populating top-level tuples.
I don't understand that part of the comment. Should it mean "We only need the 
repetition levels for populating the position slot if we are not populating 
top-level tuples."?


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/bit-packing.h
File be/src/util/bit-packing.h:

http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/bit-packing.h@67
PS8, Line 67:   static std::pair<const uint8_t*, int64_t> UnpackValues(const 
uint8_t* __restrict__ in,
> I feel like I created an artificial distinction by making some methods priv
I like the suggestion to make them public except for the helper.


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/dict-test.cc
File be/src/util/dict-test.cc:

http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/dict-test.cc@193
PS8, Line 193: // Make sure that SetData() clears any buffered literals.
It looks like these are cleared in the underlying RleBatchDecoder inside the 
DictDecoder, which took me a while to understand. Can you expand the comment to 
make it more clear why this is an interesting test? Maybe it'd even be better 
to add a test for that behavior to rle-test.cc?


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/dict-test.cc@223
PS8, Line 223:   }
The first test in this file calls pool.FreeAll(). Is this needed here too?


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/dict-test.cc@232
PS8, Line 232: 8
These are 9 values. Am I missing something?


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/dict-test.cc@246
PS8, Line 246:   vector<pair<string, vector<int>>> test_cases{
"using TestCase = pair<..>" could make this more concise.


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/dict-test.cc@277
PS8, Line 277: had
nit: have


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-encoding.h
File be/src/util/rle-encoding.h:

http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-encoding.h@93
PS8, Line 93:   RleBatchDecoder(uint8_t* buffer, int buffer_len, int bit_width) 
{
> It looks like the counts could actually be int32_t since that's what they'r
sgtm


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-test.cc
File be/src/util/rle-test.cc:

http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-test.cc@78
PS8, Line 78:     for (int i = 0; i < 8; ++i) {
nit: single line


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-test.cc@120
PS8, Line 120: BATCH_SIZE
My feeling is that we use lower case for constants within local scopes, but I 
may be wrong.


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-test.cc@171
PS8, Line 171:     if (!decoder->GetLiteralValues(num_literals_to_output, 
vals)) {
nit: single line


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-test.cc@319
PS8, Line 319:     for (int i = 0; i < num_values; i++) {
nit: single line


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-test.cc@346
PS8, Line 346:       EXPECT_EQ(0, decoded_values[i]);
nit: single line


http://gerrit.cloudera.org:8080/#/c/8267/8/testdata/data/README
File testdata/data/README:

http://gerrit.cloudera.org:8080/#/c/8267/8/testdata/data/README@122
PS8, Line 122: started
started?



--
To view, visit http://gerrit.cloudera.org:8080/8267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I35de0cf80c86f501c4a39270afc8fb8111552ac6
Gerrit-Change-Number: 8267
Gerrit-PatchSet: 10
Gerrit-Owner: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Fri, 20 Oct 2017 22:32:14 +0000
Gerrit-HasComments: Yes

Reply via email to