[ 
https://issues.apache.org/jira/browse/SPARK-44431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17743222#comment-17743222
 ] 

Ignite TC Bot commented on SPARK-44431:
---------------------------------------

User 'jchen5' has created a pull request for this issue:
https://github.com/apache/spark/pull/42007

> Wrong semantics for null IN (empty list)
> ----------------------------------------
>
>                 Key: SPARK-44431
>                 URL: https://issues.apache.org/jira/browse/SPARK-44431
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.4.0
>            Reporter: Jack Chen
>            Priority: Major
>
> {{null IN (empty list)}} incorrectly evaluates to null, when it should 
> evaluate to false. (The reason it should be false is because a IN (b1, b2) is 
> defined as a = b1 OR a = b2, and an empty IN list is treated as an empty OR 
> which is false. This is specified by ANSI SQL.)
> Many places in Spark execution (In, InSet, InSubquery) and optimization 
> (OptimizeIn, NullPropagation) implemented this wrong behavior. Also note that 
> the Spark behavior for the null IN (empty list) is inconsistent in some 
> places - literal IN lists generally return null (incorrect), while IN/NOT IN 
> subqueries mostly return false/true, respectively (correct) in this case.
> This is a longstanding correctness issue which has existed since null support 
> for IN expressions was first added to Spark.
> Doc with more details: 
> https://docs.google.com/document/d/15ttcB3OjGx5_WFKHB2COjQUbFHj5LrfNQv_d26o-wmI/edit



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to