Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/8267 )
Change subject: IMPALA-4177,IMPALA-6039: batched bit reading and rle decoding ...................................................................... Patch Set 10: (21 comments) Mostly minor stuff that I came across when reading through the change. http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/exec/parquet-column-readers.h File be/src/exec/parquet-column-readers.h: http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/exec/parquet-column-readers.h@89 PS8, Line 89: in nit: double word http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/exec/parquet-column-readers.h@91 PS8, Line 91: a nit: an http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/exec/parquet-column-readers.h@98 PS8, Line 98: bool FillCacheBitPacked(int batch_size, int* num_cached_levels); This seems to be left over from refactoring but doesn't have an implementation. Remove? http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/exec/parquet-column-readers.h@106 PS8, Line 106: in nit: extra word http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/exec/parquet-column-readers.h@116 PS8, Line 116: parquet::Encoding::type encoding_ = parquet::Encoding::PLAIN; Can you add a comment to this one, too? http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/exec/parquet-column-readers.h@128 PS8, Line 128: string filename_; Can you add a comment for completeness here and include that it's only used for error reporting? http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/exec/parquet-column-readers.cc@154 PS8, Line 154: there enough nit: missing word http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/exec/parquet-column-readers.cc@354 PS8, Line 354: since we are only populating top-level tuples. I don't understand that part of the comment. Should it mean "We only need the repetition levels for populating the position slot if we are not populating top-level tuples."? http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/bit-packing.h File be/src/util/bit-packing.h: http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/bit-packing.h@67 PS8, Line 67: static std::pair<const uint8_t*, int64_t> UnpackValues(const uint8_t* __restrict__ in, > I feel like I created an artificial distinction by making some methods priv I like the suggestion to make them public except for the helper. http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/dict-test.cc File be/src/util/dict-test.cc: http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/dict-test.cc@193 PS8, Line 193: // Make sure that SetData() clears any buffered literals. It looks like these are cleared in the underlying RleBatchDecoder inside the DictDecoder, which took me a while to understand. Can you expand the comment to make it more clear why this is an interesting test? Maybe it'd even be better to add a test for that behavior to rle-test.cc? http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/dict-test.cc@223 PS8, Line 223: } The first test in this file calls pool.FreeAll(). Is this needed here too? http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/dict-test.cc@232 PS8, Line 232: 8 These are 9 values. Am I missing something? http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/dict-test.cc@246 PS8, Line 246: vector<pair<string, vector<int>>> test_cases{ "using TestCase = pair<..>" could make this more concise. http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/dict-test.cc@277 PS8, Line 277: had nit: have http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-encoding.h File be/src/util/rle-encoding.h: http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-encoding.h@93 PS8, Line 93: RleBatchDecoder(uint8_t* buffer, int buffer_len, int bit_width) { > It looks like the counts could actually be int32_t since that's what they'r sgtm http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-test.cc File be/src/util/rle-test.cc: http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-test.cc@78 PS8, Line 78: for (int i = 0; i < 8; ++i) { nit: single line http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-test.cc@120 PS8, Line 120: BATCH_SIZE My feeling is that we use lower case for constants within local scopes, but I may be wrong. http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-test.cc@171 PS8, Line 171: if (!decoder->GetLiteralValues(num_literals_to_output, vals)) { nit: single line http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-test.cc@319 PS8, Line 319: for (int i = 0; i < num_values; i++) { nit: single line http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-test.cc@346 PS8, Line 346: EXPECT_EQ(0, decoded_values[i]); nit: single line http://gerrit.cloudera.org:8080/#/c/8267/8/testdata/data/README File testdata/data/README: http://gerrit.cloudera.org:8080/#/c/8267/8/testdata/data/README@122 PS8, Line 122: started started? -- To view, visit http://gerrit.cloudera.org:8080/8267 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I35de0cf80c86f501c4a39270afc8fb8111552ac6 Gerrit-Change-Number: 8267 Gerrit-PatchSet: 10 Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Fri, 20 Oct 2017 22:32:14 +0000 Gerrit-HasComments: Yes
