GitHub user liancheng opened a pull request:

    https://github.com/apache/spark/pull/3367

    [SPARK-4493][SQL] Don't pushdown Eq, NotEq, Lt, LtEq, Gt and GtEq 
predicates with nulls for Parquet

    Predicates like `a = NULL` and `a < NULL` can't be pushed down since 
Parquet `Lt`, `LtEq`, `Gt`, `GtEq` doesn't accept null value. Not that `Eq` and 
`NotEq` can only be used with `null` to represent predicates like `a IS NULL` 
and `a IS NOT NULL`.
    
    However, normally this issue doesn't cause NPE because any value compared 
to `NULL` results `NULL`, and Spark SQL automatically optimizes out `NULL` 
predicate in the `SimplifyFilters` rule. Only testing code that intentionally 
disables the optimizer may trigger this issue. (That's why this issue is not 
marked as blocker and I don't think we need to backport this to branch-1.1
    This PR restricts `Lt`, `LtEq`, `Gt` and `GtEq` to non-null values only, 
and only uses `Eq` with null value to pushdown `IsNull` and `IsNotNull`. Also, 
added support for Parquet `NotEq` filter for completeness and (tiny) 
performance gain, it's also used to pushdown `IsNotNull`.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/liancheng/spark filters-with-null

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/3367.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3367
    
----
commit de7de288e3e609feaee1d70b4cfbfcca624edec2
Author: Cheng Lian <[email protected]>
Date:   2014-11-19T15:36:30Z

    Adds stricter rules for Parquet filters with null

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to