William Butler created PARQUET-2124: ---------------------------------------
Summary: Bad DCHECK For Intermixed Dictionary Encoding Key: PARQUET-2124 URL: https://issues.apache.org/jira/browse/PARQUET-2124 Project: Parquet Issue Type: Bug Components: parquet-cpp Reporter: William Butler Assignee: William Butler Parquet CPP has a DCHECK for a dictionary encoded page coming after a non-dictionary encoded page. This is bad because the DCHECK can be triggered by Parquet files that have a column that has a dictionary page, then a non-dictionary encoded page, then a page of dictionary encoded values(indices). Fuzzing found such a file. While this could be turned into an exception, I don't see anything in the Parquet specification that prohibits such an occurrence of pages. This situation has brought up on the mailing list before([https://lists.apache.org/thread/3bzymmbxvmzj12km7cjz1150ndvy9bos)] and it seems like this is valid but nobody is doing it. In the PR that added this check([https://github.com/apache/parquet-cpp/pull/73)] it was noted that the check is probably not needed. -- This message was sent by Atlassian Jira (v8.20.1#820001)