Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8319 )
Change subject: IMPALA-4123: Columnar decoding in Parquet ...................................................................... Patch Set 17: (7 comments) http://gerrit.cloudera.org:8080/#/c/8319/17/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/8319/17/be/src/exec/parquet-column-readers.cc@165 PS17, Line 165: numeric_limits<int32_t>::max(); > Using such large values worries me a bit, it is easy to create an overflow That does seem like something we'd like to prevent, but I also think adding more complexity by threading through another parameter is a bad idea. It makes it hard to tell what's an invariant of the code and what's unnecessary defensive code. In some ways it's better to just use the maximum valid value here and use assertions to flush out any bugs rather than leave them latent. There's no guarantee in general that the number of levels returned by this decoded is < num_buffered_values_ or < numeric_limits<int32_t>::max() so it wouldn't be preserving an invariant, just adding a special case. I added a small comment here justifying that it's the maximum run length allowed by the Parquet standard. If we're concerned about overflows, one option is to switch to using int64_t throughout the column reader for handling value counts. I think we can convince ourselves that this will never overflow since the inputs like batch_size and the Parquet run lengths are all guaranteed to fit in an int32_t. I can go ahead with that if you agree. http://gerrit.cloudera.org:8080/#/c/8319/17/be/src/exec/parquet-column-readers.cc@648 PS17, Line 648: materializing > This seems a bit misleading to me: we do not have to materialize the batch Done http://gerrit.cloudera.org:8080/#/c/8319/17/be/src/exec/parquet-column-readers.cc@667 PS17, Line 667: num_buffered_values_ > num_buffered_values_ >= def_level_repeats should be always true at this poi I'm not sure if it's technically invalid, but I agree it would be a weird way to encode the file. I looked at adding an UNLIKELY and I think it's net worse - min() is more readable to me and there shouldn't be a measurable performance difference outside of the inner loop. http://gerrit.cloudera.org:8080/#/c/8319/17/be/src/exec/parquet-column-readers.cc@769 PS17, Line 769: if (UNLIKELY(!ConvertSlot(curr_val, tuple->GetSlot(tuple_offset_)))) { > Doing the conversion after the validation check means that timestamps that I added a TODO to the timestamp ConvertSlot() implementation http://gerrit.cloudera.org:8080/#/c/8319/17/be/src/exec/parquet-column-readers.cc@834 PS17, Line 834: StrideWriter<InternalType> out{out_vals, stride}; : // Hoist data_ into a local variable to prevent GCC from storing it every loop : // iteration. : uint8_t* data = data_; : for (int64_t i = 0; i < count; ++i, out.SkipNext(1)) { : if (UNLIKELY(!DecodeValue<Encoding::PLAIN>(&data, data_end_, out.current))) { : return false; : } : } : data_ = data; > I think that ParquetPlainEncoder would be a better place for this logic, e. Done http://gerrit.cloudera.org:8080/#/c/8319/17/be/src/exec/parquet-column-readers.cc@859 PS17, Line 859: int pos_slot_offset = pos_slot_desc()->tuple_offset(); : for (int64_t i = 0; i < num_to_read; ++i, tuple_mem += tuple_size) { : Tuple* tuple = reinterpret_cast<Tuple*>(tuple_mem); : ReadPositionBatched(rep_levels_.CacheGetNext(), : reinterpret_cast<int64_t*>(tuple->GetSlot(pos_slot_offset))); : } > This could be also simplified a bit by using stride writer. Done. Also added a new method to StrideWriter that returns the current element and advances it, since that pattern has occurred a few times now. http://gerrit.cloudera.org:8080/#/c/8319/17/be/src/util/mem-util.h File be/src/util/mem-util.h: http://gerrit.cloudera.org:8080/#/c/8319/17/be/src/util/mem-util.h@41 PS17, Line 41: // memcpy() is necessary because 'current' may not be aligned. > In which case can it point the unaligned memory? Normally it should point t There's no guarantee generally that slots in tuples are aligned. The code is a bit muddled because some places attempt to maintain alignment. As long as I've been working on the code there have been cases where the tuples aren't aligned though. E.g. various operators like the Sorter, Join and Agg didn't add padding after variable-length data to ensure that the start of the next tuple was aligned. Generally I think we should be moving towards packing tuples as densely as possible instead of trying to align things, simply because the performance penalty of unaligned memory access on modern x86 is minimal and cache capacity is much more of a limiting factor. E.g. the perf results on IMPALA-7367 showed a big difference. -- To view, visit http://gerrit.cloudera.org:8080/8319 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8c03006981c46ef0dae30602f2b73c253d9b49ef Gerrit-Change-Number: 8319 Gerrit-PatchSet: 17 Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Mostafa Mokhtar <mmokh...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Mon, 12 Nov 2018 20:44:48 +0000 Gerrit-HasComments: Yes