GitHub user sclmn added a comment to the discussion: Dictionary page offset 
logic

>Looking at the code flow, if the file is malformed (with a negative 
>dictionary_page_offset) we potentially fail to catch the negative offset on 
>line 189

I think the current code sets the col_start to the data_page_offset since we 
have the condition `column_metadata->dictionary_page_offset() > 0 `... so I'm 
not sure how line 189 returns an error for a negative dictionary offset.

>Technically, for a valid parquet file neither offset should be zero because 
>parquet has a magic number as its first four bytes. Without data pages it 
>means the row group is empty, and in theory readers should skip it (e.g. this 
>https://github.com/apache/parquet-java/pull/1018 does this)

The current code has the condition 

` col_start > column_metadata->dictionary_page_offset())`

If the data_page_offset is 0 (no data page), it will prevent us from setting 
col_start to the dictionary page offset. This will trigger an error.


GitHub link: 
https://github.com/apache/arrow/discussions/48184#discussioncomment-15018934

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to