GitHub user emkornfield added a comment to the discussion: Dictionary page 
offset logic

> I'm not sure why there is a check that the dictionary page offset is greater 
> than 0? If this isn't a dictionary page, should it be not set (first 
> condition)?

Looking at the code flow, if the file is malformed (with a negative 
dictionary_page_offset) we potentially fail to catch the negative offset on 
line 189

> Is it possible for the data page offset to equal 0 (when we don't have data 
> pages)?

Technically, for a valid parquet file neither offset should be zero because 
parquet has a magic number as its first four bytes.  Without data pages it 
means the row group is empty, and in theory readers should skip it (e.g. this 
[PR](https://github.com/apache/parquet-java/pull/1018) does this)

GitHub link: 
https://github.com/apache/arrow/discussions/48184#discussioncomment-15018665

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to