GitHub user gatorsmile opened a pull request:
https://github.com/apache/spark/pull/22718
[SPARK-25714] [BACKPORT-2.3] Fix Null Handling in the Optimizer rule
BooleanSimplification
This PR is to backport https://github.com/apache/spark/pull/22702 to branch
2.3.
---
## What changes were proposed in this pull request?
```Scala
val df1 = Seq(("abc", 1), (null, 3)).toDF("col1", "col2")
df1.write.mode(SaveMode.Overwrite).parquet("/tmp/test1")
val df2 = spark.read.parquet("/tmp/test1")
df2.filter("col1 = 'abc' OR (col1 != 'abc' AND col2 == 3)").show()
```
Before the PR, it returns both rows. After the fix, it returns `Row ("abc",
1))`. This is to fix the bug in NULL handling in BooleanSimplification. This is
a bug introduced in Spark 1.6 release.
## How was this patch tested?
Added test cases
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/gatorsmile/spark cherrypickSPARK-25714
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22718.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22718
----
commit 8303483832ff3f28bfc907c7522254c1ab5f9808
Author: gatorsmile <gatorsmile@...>
Date: 2018-10-14T03:52:26Z
fix.
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]