GitHub user hvanhovell opened a pull request:

    https://github.com/apache/spark/pull/12214

    [SPARK-12610][SQL] Left Anti Join

    ### What changes were proposed in this pull request?
    
    This PR adds support for `LEFT ANTI JOIN` to Spark SQL. A `LEFT ANTI JOIN` 
is the exact opposite of a `LEFT SEMI JOIN` and can be used to identify rows in 
one dataset that are not in another dataset. Note that `nulls` on the left side 
of the join cannot match a row on the right hand side of the join; the result 
is that left anti join will always select a row with a `null` in one or more of 
its keys.
    
    We currently add support for the following SQL join syntax:
    
        SELECT   *
        FROM      tbl1 A
                  LEFT ANTI JOIN tbl2 B
                   ON A.Id = B.Id
    
    Or using a dataframe:
    
        tbl1.as("a").join(tbl2.as("b"), $"a.id" === $"b.id", "left_anti)
    
    This PR provides serves as the basis for implementing `NOT EXISTS` and `NOT 
IN (...)` correlated sub-queries. It would also serve as good basis for 
implementing an more efficient `EXCEPT` operator.
    
    The PR has been (losely) based on PR's by both @davies 
(https://github.com/apache/spark/pull/10706) and @chenghao-intel 
(https://github.com/apache/spark/pull/10563); credit should be given where 
credit is due.
    
    This PR adds supports for `LEFT ANTI JOIN` to `BroadcastHashJoin` 
(including codegeneration), `ShuffledHashJoin` and `BroadcastNestedLoopJoin`. 
    
    ### How was this patch tested?
    
    Added tests to `JoinSuite` and ported `ExistenceJoinSuite` from 
https://github.com/apache/spark/pull/10563.
    
    cc @davies @chenghao-intel @rxin

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/hvanhovell/spark SPARK-12610

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12214.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12214
    
----
commit 05eb0d81831693309ced05d77f555ef50a8477e2
Author: Herman van Hovell <[email protected]>
Date:   2016-04-05T22:50:34Z

    WIP - left anti join

commit 5df21dff6286cf8326a5a84eb554e06c6b171545
Author: Herman van Hovell <[email protected]>
Date:   2016-04-06T19:34:10Z

    Fix bugs & clean-up

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to