GitHub user sclmn added a comment to the discussion: Dictionary page offset logic
>Looking at the code flow, if the file is malformed (with a negative >dictionary_page_offset) we potentially fail to catch the negative offset on >line 189 I think the current code sets the col_start to the data_page_offset since we have the condition `column_metadata->dictionary_page_offset() > 0 `... so I'm not sure how line 189 returns an error for a negative dictionary offset. >Technically, for a valid parquet file neither offset should be zero because >parquet has a magic number as its first four bytes. Without data pages it >means the row group is empty, and in theory readers should skip it (e.g. this >https://github.com/apache/parquet-java/pull/1018 does this) The current code has the condition ` col_start > column_metadata->dictionary_page_offset())` If the data_page_offset is 0 (no data page), it will prevent us from setting col_start to the dictionary page offset. This will trigger an error. GitHub link: https://github.com/apache/arrow/discussions/48184#discussioncomment-15018934 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
