GitHub user gatorsmile opened a pull request:

    https://github.com/apache/spark/pull/12736

    [SPARK-12660] [SQL] Implement Except by Left Anti Join

    #### What changes were proposed in this pull request?
    Replaces a logical `Except` operator with a `Left-anti Join` operator. This 
way, we can take advantage of all the benefits of join implementations (e.g. 
managed memory, code generation, broadcast joins).
    ```SQL
      SELECT a1, a2 FROM Tab1 EXCEPT SELECT b1, b2 FROM Tab2
      ==>  SELECT DISTINCT a1, a2 FROM Tab1 LEFT ANTI JOIN Tab2 ON a1<=>b1 AND 
a2<=>b2
    ```
     Note:
     1. This rule is only applicable to EXCEPT DISTINCT. Do not use it for 
EXCEPT ALL.
     2. This rule has to be done after de-duplicating the attributes; 
otherwise, the enerated
        join conditions will be incorrect.
    
    #### How was this patch tested?
    Modified and added a few test cases to verify the optimization rule and the 
results of operators.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gatorsmile/spark exceptByAntiJoin

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12736.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12736
    
----
commit cd6b684e7f0ded4992d9c16b8300191597b4f753
Author: gatorsmile <[email protected]>
Date:   2016-04-27T04:20:33Z

    initial fix

commit 43e7436ab48d78a387927e3b56eb7cf0affc2384
Author: gatorsmile <[email protected]>
Date:   2016-04-27T13:33:29Z

    added a test case.

commit 89fae2a6a6a819b8d65938915dd2fff577c8bb22
Author: gatorsmile <[email protected]>
Date:   2016-04-27T13:33:54Z

    antiJoin fix from Herman

commit 8397f2214f7971601ef60a176bcc406866b5ee8b
Author: gatorsmile <[email protected]>
Date:   2016-04-27T15:10:09Z

    added test cases

commit a104f99d0ed4102fc345411f2cfda5d3a2c104c5
Author: gatorsmile <[email protected]>
Date:   2016-04-27T15:11:02Z

    Merge remote-tracking branch 'upstream/master' into exceptByAntiJoin

commit f825dcaabf9d5bf7ffb20a72166e28f32aeec67a
Author: gatorsmile <[email protected]>
Date:   2016-04-27T15:17:51Z

    style fix.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to