Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8267 )
Change subject: IMPALA-4177,IMPALA-6039: batched bit reading and rle decoding ...................................................................... Patch Set 8: (19 comments) http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/exec/parquet-column-readers.h File be/src/exec/parquet-column-readers.h: http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/exec/parquet-column-readers.h@89 PS8, Line 89: in > nit: double word Done http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/exec/parquet-column-readers.h@91 PS8, Line 91: a > nit: an Done http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/exec/parquet-column-readers.h@98 PS8, Line 98: bool FillCacheBitPacked(int batch_size, int* num_cached_levels); > This seems to be left over from refactoring but doesn't have an implementat Yeah, I think I deleted it but forgot to save my buffer before committing. http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/exec/parquet-column-readers.h@106 PS8, Line 106: in > nit: extra word Done http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/exec/parquet-column-readers.h@116 PS8, Line 116: parquet::Encoding::type encoding_ = parquet::Encoding::PLAIN; > Can you add a comment to this one, too? Done http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/exec/parquet-column-readers.h@128 PS8, Line 128: string filename_; > Can you add a comment for completeness here and include that it's only used Done http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/exec/parquet-column-readers.cc@154 PS8, Line 154: there enough > nit: missing word Done http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/exec/parquet-column-readers.cc@354 PS8, Line 354: since we are only populating top-level tuples. > I don't understand that part of the comment. Should it mean "We only need t The original comment is correct but unclear. We only go down this code path if we're flattening out a nested collection. In that case we don't care about the nesting structure unless the position slot is populated. http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/dict-test.cc File be/src/util/dict-test.cc: http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/dict-test.cc@193 PS8, Line 193: // Make sure that SetData() clears any buffered literals. > It looks like these are cleared in the underlying RleBatchDecoder inside th I originally added it because I had a bug here that was somewhat tricky to track down when it happened in an end-to-end test. Tried to improve the comment here. The dictionary encoder itself has state (e.g. decoded_values_) so we wouldn't have the same coverage if we tested the RLE decoder in isolation. IIRC the bug I originally saw was in the dictionary decoder, not the RLE decoder. http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/dict-test.cc@223 PS8, Line 223: } > The first test in this file calls pool.FreeAll(). Is this needed here too? There aren't any strings being encoded so nothing is allocated from the pool (it'd DCHECK others). It's really just boilerplate for now. http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/dict-test.cc@232 PS8, Line 232: 8 > These are 9 values. Am I missing something? Nope bad comment. http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/dict-test.cc@246 PS8, Line 246: vector<pair<string, vector<int>>> test_cases{ > "using TestCase = pair<..>" could make this more concise. Done http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/dict-test.cc@277 PS8, Line 277: had > nit: have Done http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-test.cc File be/src/util/rle-test.cc: http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-test.cc@78 PS8, Line 78: for (int i = 0; i < 8; ++i) { > nit: single line Done http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-test.cc@120 PS8, Line 120: BATCH_SIZE > My feeling is that we use lower case for constants within local scopes, but I don't feel strongly either way and I'm probably not that consistent. I've been asked to use this casing before a few times by Jim so generally do it this way when I think about it. http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-test.cc@171 PS8, Line 171: if (!decoder->GetLiteralValues(num_literals_to_output, vals)) { > nit: single line Done http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-test.cc@319 PS8, Line 319: for (int i = 0; i < num_values; i++) { > nit: single line Done http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-test.cc@346 PS8, Line 346: EXPECT_EQ(0, decoded_values[i]); > nit: single line Done http://gerrit.cloudera.org:8080/#/c/8267/8/testdata/data/README File testdata/data/README: http://gerrit.cloudera.org:8080/#/c/8267/8/testdata/data/README@122 PS8, Line 122: started > started? Done -- To view, visit http://gerrit.cloudera.org:8080/8267 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I35de0cf80c86f501c4a39270afc8fb8111552ac6 Gerrit-Change-Number: 8267 Gerrit-PatchSet: 8 Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Mon, 23 Oct 2017 19:18:42 +0000 Gerrit-HasComments: Yes
