Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/17071 )
Change subject: IMPALA-10501: Hit DCHECK in parquet-column-readers.cc: def_levels_.CacheRemaining() <= num_buffered_values_ ...................................................................... Patch Set 1: Code-Review+1 (1 comment) I am ok with the solution, can upgrade to +2 if you mention ReadLevel()/PeakLevel() in the comment + commit message as I think that the padding issue only applies to those functions. http://gerrit.cloudera.org:8080/#/c/17071/1/be/src/exec/parquet/parquet-column-readers.cc File be/src/exec/parquet/parquet-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/17071/1/be/src/exec/parquet/parquet-column-readers.cc@518 PS1, Line 518: // 'def_levels_.CacheRemaining()' might return more values than the actual value count. : // It is because literal runs are padded with zeros to form groups of 8 and the : // decoder doesn't have the information about the real value count. I think that this is only true of if def_levels_.ReadLevel() was used instead of CacheNextBatch(batch_size), as CacheNextBatch() never reads more than batch_size, while ReadLevel() doesn't know the number of levels to expect and can fill the cache with the padding. I guess that ReadLevel() was called here: https://github.com/apache/impala/blob/master/be/src/exec/parquet/parquet-column-readers.cc#L1290 So another way to do this would be change SkipTopLevelRows() to always use CacheNextBatch() for def_level_. Not sure if this worth the effort. -- To view, visit http://gerrit.cloudera.org:8080/17071 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic071473e7b315300fd5e163225d3e39735f09c4f Gerrit-Change-Number: 17071 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Comment-Date: Tue, 16 Feb 2021 21:35:38 +0000 Gerrit-HasComments: Yes
