Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17071 )

Change subject: IMPALA-10501: Hit DCHECK in parquet-column-readers.cc: 
def_levels_.CacheRemaining() <= num_buffered_values_
......................................................................


Patch Set 1: Code-Review+1

(1 comment)

I am ok with the solution, can upgrade to +2 if you mention 
ReadLevel()/PeakLevel() in the comment + commit message as I think that the 
padding issue only applies to those functions.

http://gerrit.cloudera.org:8080/#/c/17071/1/be/src/exec/parquet/parquet-column-readers.cc
File be/src/exec/parquet/parquet-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/17071/1/be/src/exec/parquet/parquet-column-readers.cc@518
PS1, Line 518:   // 'def_levels_.CacheRemaining()' might return more values 
than the actual value count.
             :   // It is because literal runs are padded with zeros to form 
groups of 8 and the
             :   // decoder doesn't have the information about the real value 
count.
I think that this is only true of if def_levels_.ReadLevel() was used instead 
of CacheNextBatch(batch_size), as CacheNextBatch() never reads more than 
batch_size, while ReadLevel() doesn't know the number of levels to expect and 
can fill the cache with the padding.

I guess that ReadLevel() was called here: 
https://github.com/apache/impala/blob/master/be/src/exec/parquet/parquet-column-readers.cc#L1290

So another way to do this would be change SkipTopLevelRows() to always use 
CacheNextBatch() for def_level_. Not sure if this worth the effort.



--
To view, visit http://gerrit.cloudera.org:8080/17071
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic071473e7b315300fd5e163225d3e39735f09c4f
Gerrit-Change-Number: 17071
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Comment-Date: Tue, 16 Feb 2021 21:35:38 +0000
Gerrit-HasComments: Yes

Reply via email to