Github user rick-ibm commented on the pull request:
https://github.com/apache/spark/pull/8983#issuecomment-146244545
According to my reading of the SQL Standard,
NULL IN (expr1, ...)
should always evaluate to NULL. Here is my reasoning:
The 2011 SQL Standard, part 2, section 8.4 (in predicate), syntax rule 5
says that
expr IN (expr1, ...)
is equivalent to
expr = ANY (expr1, ...)
Section 8.9 (quantified comparison predicate), general rule 2, subrules (c)
and (d), say that
expr = ANY (expr1, ...)
evaluates to the following:
TRUE if (expr = exprN) is TRUE for at least one of the expressions on the
right side
FALSE if the right side is an empty list or if (expr = exprN) is FALSE
for every exprN on the right side
UNKNOWN (NULL) otherwise
Since (NULL = exprN) is always UNKNOWN and since an IN list must be
non-empty (see the BNF in section 8.4), it follows that
NULL IN (expr1, ...)
always evaluates to UNKNOWN (NULL). So Dilip's transformation of
NULL IN (expr1, ...) -> NULL
looks correct to me. There is no need to cast the expressions on the right
side to a common type. That is, not unless you want to raise syntax errors in
situations where there is no implicit conversion to a common type.
As the following examples show, Postgres, MySQL, and Derby all exhibit the
correct Standard behavior.
Thanks,
-Rick
----------------------------------------------------
MySQL behavior:
mysql> SELECT NULL IN (1, 2, 3);
SELECT NULL IN (1, 2, 3);
+-------------------+
| NULL IN (1, 2, 3) |
+-------------------+
| NULL |
+-------------------+
1 row in set (0.00 sec)
mysql> SELECT NULL IN (1, 2, NULL);
SELECT NULL IN (1, 2, NULL);
+----------------------+
| NULL IN (1, 2, NULL) |
+----------------------+
| NULL |
+----------------------+
1 row in set (0.00 sec)
mysql> SELECT NULL IN ();
SELECT NULL IN ();
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual
that corresponds to your MySQL server version for the right syntax to use near
')' at line 1
----------------------------------------------------
Postgres behavior:
mydb=# SELECT NULL IN (1, 2, 3);
SELECT NULL IN (1, 2, 3);
?column?
----------
(1 row)
mydb=# SELECT NULL IN (1, 2, NULL);
SELECT NULL IN (1, 2, NULL);
?column?
----------
(1 row)
mydb=# SELECT NULL IN ();
SELECT NULL IN ();
ERROR: syntax error at or near ")"
LINE 1: SELECT NULL IN ();
----------------------------------------------------
Derby behavior:
ij> VALUES CAST (NULL AS INT) IN (1, 2, 3);
1
-----
NULL
1 row selected
ij> VALUES CAST (NULL AS INT) IN (1, 2, CAST (NULL AS INT));
1
-----
NULL
1 row selected
ij> VALUES CAST (NULL AS INT) IN ();
ERROR 42X01: Syntax error: Encountered ")" at line 1, column 31.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]