[GitHub] francis0407 commented on a change in pull request #23783: [SPARK-26854][SQL] Support ANY/SOME subquery

GitBox Wed, 20 Feb 2019 23:38:31 -0800

francis0407 commented on a change in pull request #23783: [SPARK-26854][SQL] 
Support ANY/SOME subquery
URL: https://github.com/apache/spark/pull/23783#discussion_r258807816


 ##########
 File path: 
sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/any-subquery.sql
 ##########
 @@ -0,0 +1,184 @@
+-- A basic test suite for ANY/SOME predicate
 
 Review comment:
   I tried these queries in PostgreSQL and SQL Server, there are two things 
different:
   1. When compare muti-columns like this:
   ```SQL
   SELECT t1a, t1b
   FROM   t1
   WHERE  (t1b, t1c) <= ANY (SELECT t2b, t2c
                             FROM   t2
                             WHERE  t2c > 12)
   ```
   PostgreSQL output:
   ```
   "val1a"      6
   "val1b"      8
   "val1c"      8
   "val1e"      10
   "val1e"      10
   "val1d"      10
   "val1a"      6
   "val1e"      10
   ```
   But Spark SQL output:
   ```
   val1a        6
   val1a        6
   val1b        8
   val1c        8
   ```
   This is because PostgreSQL returns `True` for `(10, null) < (12, 16)`.
   
   2. A more important problem is that, maybe we can't simply regard `x != 
ANY(query)` as `Not(x = ANY(query))`. Both PostgreSQL and SQL Server explain it 
as `Semi Join(x <> y)`, e.g.,
   ```SQL
   SELECT t1a, t1b
   FROM   t1
   WHERE  t1a != ANY (SELECT t2a
                      FROM   t2)
   ORDER BY t1a
   ```
   PostgreSQL' plan:
   ```
   Sort  (cost=2074.04..2074.69 rows=260 width=84)
     Sort Key: t1.t1a
     ->  Nested Loop Semi Join  (cost=0.00..2063.62 rows=260 width=84)
           Join Filter: ((t1.t1a)::text <> (t2.t2a)::text)
           ->  Seq Scan on t1  (cost=0.00..15.20 rows=520 width=84)
           ->  Materialize  (cost=0.00..17.80 rows=520 width=82)
                 ->  Seq Scan on t2  (cost=0.00..15.20 rows=520 width=82)
   ```
   
   So we need to handle this again...

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] francis0407 commented on a change in pull request #23783: [SPARK-26854][SQL] Support ANY/SOME subquery

Reply via email to