[jira] [Commented] (PARQUET-389) Filter predicates should work with missing columns

Ryan Blue (JIRA) Wed, 28 Oct 2015 09:39:40 -0700

    [ 
https://issues.apache.org/jira/browse/PARQUET-389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14978702#comment-14978702
 ]


Ryan Blue commented on PARQUET-389:
-----------------------------------

I agree, assuming that by "merged" you mean resolving the requested schema 
against different file schemas.

> Filter predicates should work with missing columns
> --------------------------------------------------
>
>                 Key: PARQUET-389
>                 URL: https://issues.apache.org/jira/browse/PARQUET-389
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-mr
>    Affects Versions: 1.6.0, 1.7.0, 1.8.0
>            Reporter: Cheng Lian
>
> This issue originates from SPARK-11103, which contains detailed information 
> about how to reproduce it.
> The major problem here is that, filter predicates pushed down assert that 
> columns they touch must exist in the target physical files. But this isn't 
> true in case of schema merging.
> Actually this assertion is unnecessary, because if a column is missing in the 
> filter schema, the column is considered to be filled by nulls, and all the 
> filters should be able to act accordingly. For example, if we push down {{a = 
> 1}} but {{a}} is missing in the underlying physical file, all records in this 
> file should be dropped since {{a}} is always null. On the other hand, if we 
> push down {{a IS NULL}}, all records should be preserved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PARQUET-389) Filter predicates should work with missing columns

Reply via email to