Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8267 )

Change subject: IMPALA-4177,IMPALA-6039: batched bit reading and rle decoding
......................................................................


Patch Set 8:

(19 comments)

http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/exec/parquet-column-readers.h
File be/src/exec/parquet-column-readers.h:

http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/exec/parquet-column-readers.h@89
PS8, Line 89: in
> nit: double word
Done


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/exec/parquet-column-readers.h@91
PS8, Line 91: a
> nit: an
Done


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/exec/parquet-column-readers.h@98
PS8, Line 98:   bool FillCacheBitPacked(int batch_size, int* num_cached_levels);
> This seems to be left over from refactoring but doesn't have an implementat
Yeah, I think I deleted it but forgot to save my buffer before committing.


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/exec/parquet-column-readers.h@106
PS8, Line 106: in
> nit: extra word
Done


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/exec/parquet-column-readers.h@116
PS8, Line 116:   parquet::Encoding::type encoding_ = parquet::Encoding::PLAIN;
> Can you add a comment to this one, too?
Done


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/exec/parquet-column-readers.h@128
PS8, Line 128:   string filename_;
> Can you add a comment for completeness here and include that it's only used
Done


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/exec/parquet-column-readers.cc@154
PS8, Line 154: there enough
> nit: missing word
Done


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/exec/parquet-column-readers.cc@354
PS8, Line 354: since we are only populating top-level tuples.
> I don't understand that part of the comment. Should it mean "We only need t
The original comment is correct but unclear. We only go down this code path if 
we're flattening out a nested collection. In that case we don't care about the 
nesting structure unless the position slot is populated.


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/dict-test.cc
File be/src/util/dict-test.cc:

http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/dict-test.cc@193
PS8, Line 193: // Make sure that SetData() clears any buffered literals.
> It looks like these are cleared in the underlying RleBatchDecoder inside th
I originally added it because I had a bug here that was somewhat tricky to 
track down when it happened in an end-to-end test. Tried to improve the comment 
here.

The dictionary encoder itself has state (e.g. decoded_values_) so we wouldn't 
have the same coverage if we tested the RLE decoder in isolation. IIRC the bug 
I originally saw was in the dictionary decoder, not the RLE decoder.


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/dict-test.cc@223
PS8, Line 223:   }
> The first test in this file calls pool.FreeAll(). Is this needed here too?
There aren't any strings being encoded so nothing is allocated from the pool 
(it'd DCHECK others). It's really just boilerplate for now.


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/dict-test.cc@232
PS8, Line 232: 8
> These are 9 values. Am I missing something?
Nope bad comment.


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/dict-test.cc@246
PS8, Line 246:   vector<pair<string, vector<int>>> test_cases{
> "using TestCase = pair<..>" could make this more concise.
Done


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/dict-test.cc@277
PS8, Line 277: had
> nit: have
Done


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-test.cc
File be/src/util/rle-test.cc:

http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-test.cc@78
PS8, Line 78:     for (int i = 0; i < 8; ++i) {
> nit: single line
Done


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-test.cc@120
PS8, Line 120: BATCH_SIZE
> My feeling is that we use lower case for constants within local scopes, but
I don't feel strongly either way and I'm probably not that consistent. I've 
been asked to use this casing before a few times by Jim so generally do it this 
way when I think about it.


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-test.cc@171
PS8, Line 171:     if (!decoder->GetLiteralValues(num_literals_to_output, 
vals)) {
> nit: single line
Done


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-test.cc@319
PS8, Line 319:     for (int i = 0; i < num_values; i++) {
> nit: single line
Done


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-test.cc@346
PS8, Line 346:       EXPECT_EQ(0, decoded_values[i]);
> nit: single line
Done


http://gerrit.cloudera.org:8080/#/c/8267/8/testdata/data/README
File testdata/data/README:

http://gerrit.cloudera.org:8080/#/c/8267/8/testdata/data/README@122
PS8, Line 122: started
> started?
Done



--
To view, visit http://gerrit.cloudera.org:8080/8267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I35de0cf80c86f501c4a39270afc8fb8111552ac6
Gerrit-Change-Number: 8267
Gerrit-PatchSet: 8
Gerrit-Owner: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Mon, 23 Oct 2017 19:18:42 +0000
Gerrit-HasComments: Yes

Reply via email to