GitHub user JoshRosen opened a pull request:
https://github.com/apache/spark/pull/7768
[SPARK-9372] [SQL] [WIP] Filter nulls in join keys
This PR adds an optimization rule, `FilterNullsInJoinKey`, to add `Filter`
before join operators to filter out rows having null values for join keys.
This optimization is guarded by a new SQL conf,
`spark.sql.advancedOptimization`.
The code in this PR was authored by @yhuai; I'm opening this PR to factor
out this change from #7685, a larger pull request which contains two other
optimizations.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/JoshRosen/spark filter-nulls-in-join-key
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/7768.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #7768
----
commit 220112906737b3db668513a024423b35a2c2f32a
Author: Yin Huai <[email protected]>
Date: 2015-07-23T19:21:08Z
Filter out rows that will not be joined in equal joins early.
commit d5b84c399c6966ad509276b3f146948ff06e5ca4
Author: Yin Huai <[email protected]>
Date: 2015-07-24T01:48:47Z
Do not add unnessary filters.
commit 69bb0724eb1dd92d20afdde4b607d37bc4d5e4ca
Author: Yin Huai <[email protected]>
Date: 2015-07-24T01:49:34Z
Introduce NullSafeHashPartitioning and NullUnsafePartitioning.
commit 7c2d2d87a7182fbc9fc8b35fd75db64e147f0ff7
Author: Yin Huai <[email protected]>
Date: 2015-07-26T22:51:38Z
Bug fix and refactoring.
commit e616d3b0a2fa5836956c15b9f64410683a3ef9db
Author: Yin Huai <[email protected]>
Date: 2015-07-27T03:28:49Z
wip
commit c6667e745b0ce0c24dccd419d8fea10e21d24290
Author: Yin Huai <[email protected]>
Date: 2015-07-27T05:03:46Z
Add PartitioningCollection.
commit f9516b0687a90713f2b401d49418ec8ee081f457
Author: Yin Huai <[email protected]>
Date: 2015-07-27T05:29:48Z
Style
commit d3d2e646d525cc9c6e425ae99020d26bbaab10dc
Author: Yin Huai <[email protected]>
Date: 2015-07-27T21:14:34Z
First round of cleanup.
commit c57a95465a2410fa515d6bbcf3dd0276a19f1d21
Author: Yin Huai <[email protected]>
Date: 2015-07-27T23:39:49Z
Bug fix.
commit 40eeece1238690a9a6c68592c0dd024faf5ac06c
Author: Josh Rosen <[email protected]>
Date: 2015-07-30T02:13:58Z
Merge remote-tracking branch 'origin/master' into filter-nulls-in-join-key
commit 303236bed06817befc2786f3c01e34b071f819f1
Author: Josh Rosen <[email protected]>
Date: 2015-07-30T02:15:55Z
Revert changes that are unrelated to null join key filtering
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]