William Butler created PARQUET-2124:
---------------------------------------

             Summary: Bad DCHECK For Intermixed Dictionary Encoding
                 Key: PARQUET-2124
                 URL: https://issues.apache.org/jira/browse/PARQUET-2124
             Project: Parquet
          Issue Type: Bug
          Components: parquet-cpp
            Reporter: William Butler
            Assignee: William Butler


Parquet CPP has a DCHECK for a dictionary encoded page coming after a 
non-dictionary encoded page. This is bad because the DCHECK can be triggered by 
Parquet files that have a column that has a dictionary page, then a 
non-dictionary encoded page, then a page of dictionary encoded values(indices). 
Fuzzing found such a file. While this could be turned into an exception, I 
don't see anything in the Parquet specification that prohibits such an 
occurrence of pages.

This situation has brought up on the mailing list 
before([https://lists.apache.org/thread/3bzymmbxvmzj12km7cjz1150ndvy9bos)] and 
it seems like this is valid but nobody is doing it.

In the PR that added this 
check([https://github.com/apache/parquet-cpp/pull/73)] it was noted that the 
check is probably not needed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to