GitHub user dilipbiswal opened a pull request:
https://github.com/apache/spark/pull/22141
[SPARK-25154] Support NOT IN sub-queries inside nested OR conditions.
## What changes were proposed in this pull request?
Currently NOT IN subqueries (predicated null aware subquery) are not
allowed inside OR expressions. We currently catch this condition in
checkAnalysis and throw an error.
This PR enhances the subquery rewrite to support this type of queries.
Query
```SQL
SELECT * FROM s1 WHERE a > 5 or b NOT IN (SELECT c FROM s2);
```
Optimized Plan
```SQL
a: int, b: int
Project [a#16, b#17]
+- Filter ((a#16 > 5) || NOT b#17 IN (list#13 []))
: +- Project [c#18]
: +- SubqueryAlias `default`.`s2`
: +- HiveTableRelation `default`.`s2`,
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c#18, d#19]
+- SubqueryAlias `default`.`s1`
+- HiveTableRelation `default`.`s1`,
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [a#16, b#17]
```
## How was this patch tested?
Added new testsin SQLQueryTestSuite, RewriteSubquerySuite and SubquerySuite.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/dilipbiswal/spark SPARK-25154
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22141.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22141
----
commit 473bfb500b07626ff42a9e5ddc167970299bde21
Author: Dilip Biswal <dbiswal@...>
Date: 2018-08-18T21:22:37Z
[SPARK-25154] Support NOT IN sub-queries inside nested OR conditions.
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]