GitHub user mgaido91 opened a pull request:
https://github.com/apache/spark/pull/22029
[SPARK-24395][SQL] IN operator should return NULL when comparing struct
with NULL fields
## What changes were proposed in this pull request?
Spark's IN operator behaves different from other RDBMS when structs
containing NULL fields are involved. In this case, Spark returns `false`, while
other RDBMS return `NULL`. This is critical especially when there are NOT IN
filters, as Spark doesn't filter rows containing NULLs in that scenario
(instead other RDBMS do).
The PR proposes to change Spark's IN operator behavior in order to align
with the behavior of other RDBMS and introduces a flag which can be used by
users to switch back to the previuos behavior.
## How was this patch tested?
added UTs
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/mgaido91/spark SPARK-24395
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22029.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22029
----
commit 54ee21ad903827baf1117356f692370225c8662a
Author: Marco Gaido <marcogaido91@...>
Date: 2018-08-07T15:52:17Z
[SPARK-24395][SQL] IN operator should return NULL when comparing struct
with NULL fields
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]