[Impala-ASF-CR] IMPALA-4123: Columnar decoding in Parquet

Tim Armstrong (Code Review) Mon, 12 Nov 2018 12:45:02 -0800

Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8319 )


Change subject: IMPALA-4123: Columnar decoding in Parquet
......................................................................


Patch Set 17:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/8319/17/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/8319/17/be/src/exec/parquet-column-readers.cc@165
PS17, Line 165: numeric_limits<int32_t>::max();
> Using such large values worries me a bit, it is easy to create an overflow 
That does seem like something we'd like to prevent, but I also think adding 
more complexity by threading through another parameter is a bad idea. It makes 
it hard to tell what's an invariant of the code and what's unnecessary 
defensive code. In some ways it's better to just use the maximum valid value 
here and use assertions to flush out any bugs rather than leave them latent.

There's no guarantee in general that the number of levels returned by this 
decoded is < num_buffered_values_ or < numeric_limits<int32_t>::max() so it 
wouldn't be preserving an invariant, just adding a special case. I added a 
small comment here justifying that it's the maximum run length allowed by the 
Parquet standard.

If we're concerned about overflows, one option is to switch to using int64_t 
throughout the column reader for handling value counts. I think we can convince 
ourselves that this will never overflow since the inputs like batch_size and 
the Parquet run lengths are all guaranteed to fit in an int32_t.

I can go ahead with that if you agree.


http://gerrit.cloudera.org:8080/#/c/8319/17/be/src/exec/parquet-column-readers.cc@648
PS17, Line 648: materializing
> This seems a bit misleading to me: we do not have to materialize the batch
Done


http://gerrit.cloudera.org:8080/#/c/8319/17/be/src/exec/parquet-column-readers.cc@667
PS17, Line 667: num_buffered_values_
> num_buffered_values_ >= def_level_repeats should be always true at this poi
I'm not sure if it's technically invalid, but I agree it would be a weird way 
to encode the file. I looked at adding an UNLIKELY and I think it's net worse - 
min() is more readable to me and there shouldn't be  a measurable performance 
difference outside of the inner loop.


http://gerrit.cloudera.org:8080/#/c/8319/17/be/src/exec/parquet-column-readers.cc@769
PS17, Line 769: if (UNLIKELY(!ConvertSlot(curr_val, 
tuple->GetSlot(tuple_offset_)))) {
> Doing the conversion after the validation check means that timestamps that
I added a TODO to the timestamp ConvertSlot() implementation


http://gerrit.cloudera.org:8080/#/c/8319/17/be/src/exec/parquet-column-readers.cc@834
PS17, Line 834:     StrideWriter<InternalType> out{out_vals, stride};
              :     // Hoist data_ into a local variable to prevent GCC from 
storing it every loop
              :     // iteration.
              :     uint8_t* data = data_;
              :     for (int64_t i = 0; i < count; ++i, out.SkipNext(1)) {
              :       if (UNLIKELY(!DecodeValue<Encoding::PLAIN>(&data, 
data_end_, out.current))) {
              :         return false;
              :       }
              :     }
              :     data_ = data;
> I think that ParquetPlainEncoder would be a better place for this logic, e.
Done


http://gerrit.cloudera.org:8080/#/c/8319/17/be/src/exec/parquet-column-readers.cc@859
PS17, Line 859:   int pos_slot_offset = pos_slot_desc()->tuple_offset();
              :   for (int64_t i = 0; i < num_to_read; ++i, tuple_mem += 
tuple_size) {
              :     Tuple* tuple = reinterpret_cast<Tuple*>(tuple_mem);
              :     ReadPositionBatched(rep_levels_.CacheGetNext(),
              :         
reinterpret_cast<int64_t*>(tuple->GetSlot(pos_slot_offset)));
              :   }
> This could be also simplified a bit by using stride writer.
Done. Also added a new method to StrideWriter that returns the current element 
and advances it, since that pattern has occurred a few times now.


http://gerrit.cloudera.org:8080/#/c/8319/17/be/src/util/mem-util.h
File be/src/util/mem-util.h:

http://gerrit.cloudera.org:8080/#/c/8319/17/be/src/util/mem-util.h@41
PS17, Line 41: // memcpy() is necessary because 'current' may not be aligned.
> In which case can it point the unaligned memory? Normally it should point t
There's no guarantee generally that slots in tuples are aligned. The code is a 
bit muddled because some places attempt to maintain alignment. As long as I've 
been working on the code there have been cases where the tuples aren't aligned 
though. E.g. various operators like the Sorter, Join and Agg didn't add padding 
after variable-length data to ensure that the start of the next tuple was 
aligned.

Generally I think we should be moving towards packing tuples as densely as 
possible instead of trying to align things, simply because the performance 
penalty of unaligned memory access on modern x86 is minimal and cache capacity 
is much more of a limiting factor. E.g. the perf results on IMPALA-7367 showed 
a big difference.



--
To view, visit http://gerrit.cloudera.org:8080/8319
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8c03006981c46ef0dae30602f2b73c253d9b49ef
Gerrit-Change-Number: 8319
Gerrit-PatchSet: 17
Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mmokh...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Mon, 12 Nov 2018 20:44:48 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4123: Columnar decoding in Parquet

Reply via email to