[GitHub] spark pull request: [SPARK-8654][SQL] Fix Analysis exception when ...

rick-ibm Wed, 07 Oct 2015 09:04:30 -0700

Github user rick-ibm commented on the pull request:

    https://github.com/apache/spark/pull/8983#issuecomment-146244545
  
    According to my reading of the SQL Standard,
    
      NULL IN (expr1, ...)
    
    should always evaluate to NULL. Here is my reasoning:
    
    The 2011 SQL Standard, part 2, section 8.4 (in predicate), syntax rule 5 
says that
    
      expr IN (expr1, ...)
    
    is equivalent to
    
      expr = ANY (expr1, ...)
    
    Section 8.9 (quantified comparison predicate), general rule 2, subrules (c) 
and (d), say that
    
      expr = ANY (expr1, ...)
    
    evaluates to the following:
    
      TRUE if (expr = exprN) is TRUE for at least one of the expressions on the 
right side
    
      FALSE if the right side is an empty list or if (expr = exprN) is FALSE 
for every exprN on the right side
    
      UNKNOWN (NULL) otherwise
    
    Since (NULL = exprN) is always UNKNOWN and since an IN list must be 
non-empty (see the BNF in section 8.4), it follows that
    
      NULL IN (expr1, ...)
    
    always evaluates to UNKNOWN (NULL). So Dilip's transformation of
    
      NULL IN (expr1, ...) -> NULL
    
    looks correct to me. There is no need to cast the expressions on the right 
side to a common type. That is, not unless you want to raise syntax errors in 
situations where there is no implicit conversion to a common type.
    
    As the following examples show, Postgres, MySQL, and Derby all exhibit the 
correct Standard behavior.
    
    Thanks,
    -Rick
    
    ----------------------------------------------------
    
    MySQL behavior:
    
    mysql> SELECT NULL IN (1, 2, 3);
    SELECT NULL IN (1, 2, 3);
    +-------------------+
    | NULL IN (1, 2, 3) |
    +-------------------+
    |              NULL |
    +-------------------+
    1 row in set (0.00 sec)
    
    mysql> SELECT NULL IN (1, 2, NULL);
    SELECT NULL IN (1, 2, NULL);
    +----------------------+
    | NULL IN (1, 2, NULL) |
    +----------------------+
    |                 NULL |
    +----------------------+
    1 row in set (0.00 sec)
    
    mysql> SELECT NULL IN ();
    SELECT NULL IN ();
    ERROR 1064 (42000): You have an error in your SQL syntax; check the manual 
that corresponds to your MySQL server version for the right syntax to use near 
')' at line 1
    
    
    
    ----------------------------------------------------
    
    Postgres behavior:
    
    mydb=# SELECT NULL IN (1, 2, 3);
    SELECT NULL IN (1, 2, 3);
     ?column? 
    ----------
     
    (1 row)
    
    mydb=# SELECT NULL IN (1, 2, NULL);
    SELECT NULL IN (1, 2, NULL);
     ?column? 
    ----------
     
    (1 row)
    
    mydb=# SELECT NULL IN ();
    SELECT NULL IN ();
    ERROR:  syntax error at or near ")"
    LINE 1: SELECT NULL IN ();
    
    
    
    ----------------------------------------------------
    
    Derby behavior:
    
    ij> VALUES CAST (NULL AS INT) IN (1, 2, 3);
    1    
    -----
    NULL 
    
    1 row selected
    ij> VALUES CAST (NULL AS INT) IN (1, 2, CAST (NULL AS INT));
    1    
    -----
    NULL 
    
    1 row selected
    ij> VALUES CAST (NULL AS INT) IN ();
    ERROR 42X01: Syntax error: Encountered ")" at line 1, column 31.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-8654][SQL] Fix Analysis exception when ...

Reply via email to