GitHub user JoshRosen opened a pull request:

    https://github.com/apache/spark/pull/7768

    [SPARK-9372] [SQL] [WIP] Filter nulls in join keys

    This PR adds an optimization rule, `FilterNullsInJoinKey`, to add `Filter` 
before join operators to filter out rows having null values for join keys.
    
    This optimization is guarded by a new SQL conf, 
`spark.sql.advancedOptimization`.
    
    The code in this PR was authored by @yhuai; I'm opening this PR to factor 
out this change from #7685, a larger pull request which contains two other 
optimizations.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/JoshRosen/spark filter-nulls-in-join-key

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/7768.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #7768
    
----
commit 220112906737b3db668513a024423b35a2c2f32a
Author: Yin Huai <[email protected]>
Date:   2015-07-23T19:21:08Z

    Filter out rows that will not be joined in equal joins early.

commit d5b84c399c6966ad509276b3f146948ff06e5ca4
Author: Yin Huai <[email protected]>
Date:   2015-07-24T01:48:47Z

    Do not add unnessary filters.

commit 69bb0724eb1dd92d20afdde4b607d37bc4d5e4ca
Author: Yin Huai <[email protected]>
Date:   2015-07-24T01:49:34Z

    Introduce NullSafeHashPartitioning and NullUnsafePartitioning.

commit 7c2d2d87a7182fbc9fc8b35fd75db64e147f0ff7
Author: Yin Huai <[email protected]>
Date:   2015-07-26T22:51:38Z

    Bug fix and refactoring.

commit e616d3b0a2fa5836956c15b9f64410683a3ef9db
Author: Yin Huai <[email protected]>
Date:   2015-07-27T03:28:49Z

    wip

commit c6667e745b0ce0c24dccd419d8fea10e21d24290
Author: Yin Huai <[email protected]>
Date:   2015-07-27T05:03:46Z

    Add PartitioningCollection.

commit f9516b0687a90713f2b401d49418ec8ee081f457
Author: Yin Huai <[email protected]>
Date:   2015-07-27T05:29:48Z

    Style

commit d3d2e646d525cc9c6e425ae99020d26bbaab10dc
Author: Yin Huai <[email protected]>
Date:   2015-07-27T21:14:34Z

    First round of cleanup.

commit c57a95465a2410fa515d6bbcf3dd0276a19f1d21
Author: Yin Huai <[email protected]>
Date:   2015-07-27T23:39:49Z

    Bug fix.

commit 40eeece1238690a9a6c68592c0dd024faf5ac06c
Author: Josh Rosen <[email protected]>
Date:   2015-07-30T02:13:58Z

    Merge remote-tracking branch 'origin/master' into filter-nulls-in-join-key

commit 303236bed06817befc2786f3c01e34b071f819f1
Author: Josh Rosen <[email protected]>
Date:   2015-07-30T02:15:55Z

    Revert changes that are unrelated to null join key filtering

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to