francis0407 commented on issue #24344: [SPARK-27440][SQL] Optimize uncorrelated 
predicate subquery
URL: https://github.com/apache/spark/pull/24344#issuecomment-485714371
 
 
   I'm still working on this.
   
   When I was dealing with 'null' with `InSubquery`, I found something wrong.
   
   In `catalyst/optimizer/expressions.scala`, line 551, Rule `NullPropagation`:
   ```scala
         // If the value expression is NULL then transform the In expression to 
null literal.
         case In(Literal(null, _), _) => Literal.create(null, BooleanType)
         case InSubquery(Seq(Literal(null, _)), _) => Literal.create(null, 
BooleanType)
   ```
   
   In this piece of code, we transform `null in (subquery)` into `null`.
   Actually, I think this does not meet the SQL standard. `null in (subquery)` 
could be evaluated to `null` only if the subquery returns an non-empty result 
set. If the subquery's result set is empty, it should be evaluated to `false`.
   
   
   For example, in PostgreSQL:
   ```
   test=# select null in (select y from s where false);   
    ?column? 
   ----------
    f
   (1 row)
   
   test=# select null in (select y from s);            
    ?column? 
   ----------
    
   (1 row)
   
   test=# 
   ``` 
   
   Is this a BUG or a feature? cc @cloud-fan 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to