GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/22702
[SPARK-25714] Fix Null Handling in the Optimizer rule BooleanSimplification ## What changes were proposed in this pull request? ```Scala val df1 = Seq(("abc", 1), (null, 2)).toDF("col1", "col2") df1.write.mode(SaveMode.Overwrite).parquet("/tmp/test1") val df2 = spark.read.parquet("/tmp/test1") df2.filter("col1 = 'abc' OR (col1 != 'abc' AND col2 == 3)").show() ``` Before the PR, it returns both rows. After the fix, it returns `Row ("abc", 1))`. This is to fix the bug in NULL handling in BooleanSimplification. This is a bug introduced in Spark 1.6 release. ## How was this patch tested? Added test cases You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark fixBooleanSimplify2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22702.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22702 ---- commit 5d1dde12b6cb1b3a61f32d678b952ac4ce5b6c0f Author: gatorsmile <gatorsmile@...> Date: 2018-10-11T21:38:38Z fix commit a9359abff62017f46f33ef18d7f56f97c885af3d Author: gatorsmile <gatorsmile@...> Date: 2018-10-11T21:40:44Z style ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org