francis0407 commented on a change in pull request #23783: [SPARK-26854][SQL]
Support ANY/SOME subquery
URL: https://github.com/apache/spark/pull/23783#discussion_r258807816
##########
File path:
sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/any-subquery.sql
##########
@@ -0,0 +1,184 @@
+-- A basic test suite for ANY/SOME predicate
Review comment:
I tried these queries in PostgreSQL and SQL Server, there are two things
different:
1. When compare muti-columns like this:
```SQL
SELECT t1a, t1b
FROM t1
WHERE (t1b, t1c) <= ANY (SELECT t2b, t2c
FROM t2
WHERE t2c > 12)
```
PostgreSQL output:
```
"val1a" 6
"val1b" 8
"val1c" 8
"val1e" 10
"val1e" 10
"val1d" 10
"val1a" 6
"val1e" 10
```
But Spark SQL output:
```
val1a 6
val1a 6
val1b 8
val1c 8
```
This is because PostgreSQL returns `True` for `(10, null) < (12, 16)`.
2. A more important problem is that, maybe we can't simply regard `x !=
ANY(query)` as `Not(x = ANY(query))`. Both PostgreSQL and SQL Server explain it
as `Semi Join(x <> y)`, e.g.,
```SQL
SELECT t1a, t1b
FROM t1
WHERE t1a != ANY (SELECT t2a
FROM t2)
ORDER BY t1a
```
PostgreSQL' plan:
```
Sort (cost=2074.04..2074.69 rows=260 width=84)
Sort Key: t1.t1a
-> Nested Loop Semi Join (cost=0.00..2063.62 rows=260 width=84)
Join Filter: ((t1.t1a)::text <> (t2.t2a)::text)
-> Seq Scan on t1 (cost=0.00..15.20 rows=520 width=84)
-> Materialize (cost=0.00..17.80 rows=520 width=82)
-> Seq Scan on t2 (cost=0.00..15.20 rows=520 width=82)
```
So we need to handle this again...
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]