GitHub user gatorsmile opened a pull request:
https://github.com/apache/spark/pull/12736
[SPARK-12660] [SQL] Implement Except by Left Anti Join
#### What changes were proposed in this pull request?
Replaces a logical `Except` operator with a `Left-anti Join` operator. This
way, we can take advantage of all the benefits of join implementations (e.g.
managed memory, code generation, broadcast joins).
```SQL
SELECT a1, a2 FROM Tab1 EXCEPT SELECT b1, b2 FROM Tab2
==> SELECT DISTINCT a1, a2 FROM Tab1 LEFT ANTI JOIN Tab2 ON a1<=>b1 AND
a2<=>b2
```
Note:
1. This rule is only applicable to EXCEPT DISTINCT. Do not use it for
EXCEPT ALL.
2. This rule has to be done after de-duplicating the attributes;
otherwise, the enerated
join conditions will be incorrect.
#### How was this patch tested?
Modified and added a few test cases to verify the optimization rule and the
results of operators.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/gatorsmile/spark exceptByAntiJoin
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/12736.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #12736
----
commit cd6b684e7f0ded4992d9c16b8300191597b4f753
Author: gatorsmile <[email protected]>
Date: 2016-04-27T04:20:33Z
initial fix
commit 43e7436ab48d78a387927e3b56eb7cf0affc2384
Author: gatorsmile <[email protected]>
Date: 2016-04-27T13:33:29Z
added a test case.
commit 89fae2a6a6a819b8d65938915dd2fff577c8bb22
Author: gatorsmile <[email protected]>
Date: 2016-04-27T13:33:54Z
antiJoin fix from Herman
commit 8397f2214f7971601ef60a176bcc406866b5ee8b
Author: gatorsmile <[email protected]>
Date: 2016-04-27T15:10:09Z
added test cases
commit a104f99d0ed4102fc345411f2cfda5d3a2c104c5
Author: gatorsmile <[email protected]>
Date: 2016-04-27T15:11:02Z
Merge remote-tracking branch 'upstream/master' into exceptByAntiJoin
commit f825dcaabf9d5bf7ffb20a72166e28f32aeec67a
Author: gatorsmile <[email protected]>
Date: 2016-04-27T15:17:51Z
style fix.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]