[ https://issues.apache.org/jira/browse/PARQUET-389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14978702#comment-14978702 ]
Ryan Blue commented on PARQUET-389: ----------------------------------- I agree, assuming that by "merged" you mean resolving the requested schema against different file schemas. > Filter predicates should work with missing columns > -------------------------------------------------- > > Key: PARQUET-389 > URL: https://issues.apache.org/jira/browse/PARQUET-389 > Project: Parquet > Issue Type: Bug > Components: parquet-mr > Affects Versions: 1.6.0, 1.7.0, 1.8.0 > Reporter: Cheng Lian > > This issue originates from SPARK-11103, which contains detailed information > about how to reproduce it. > The major problem here is that, filter predicates pushed down assert that > columns they touch must exist in the target physical files. But this isn't > true in case of schema merging. > Actually this assertion is unnecessary, because if a column is missing in the > filter schema, the column is considered to be filled by nulls, and all the > filters should be able to act accordingly. For example, if we push down {{a = > 1}} but {{a}} is missing in the underlying physical file, all records in this > file should be dropped since {{a}} is always null. On the other hand, if we > push down {{a IS NULL}}, all records should be preserved. -- This message was sent by Atlassian JIRA (v6.3.4#6332)