GitHub user sameeragarwal opened a pull request:
https://github.com/apache/spark/pull/11372
[WIP][SPARK-13495][SQL] Add Null Filters in the query plan for
Filters/Joins based on their data constraints
## What changes were proposed in this pull request?
This PR adds an optimizer rule to eliminate reading (unnecessary) NULL
values if they are not required for correctness by inserting `isNotNull`
filters is the query plan. These filters are currently inserted beneath
existing `Filter` and `Join` operators and are inferred based on their data
constraints.
Note: While this optimization is applicable to all types of join, it
primarily benefits `Inner` and `LeftSemi` joins.
## How was this patch tested?
WIP
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/sameeragarwal/spark gen-isnotnull
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/11372.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #11372
----
commit 15eac821b5328ce34ab2a279fad2e48f471ccbdc
Author: Sameer Agarwal <[email protected]>
Date: 2016-02-25T07:49:17Z
optimizer rules
commit 06d74da3ad1fd2748c395a143cfdd9f99e16009c
Author: Sameer Agarwal <[email protected]>
Date: 2016-02-25T18:10:50Z
Null filtering
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]