GitHub user kevinyu98 opened a pull request:
https://github.com/apache/spark/pull/9720
[SPARK-11447][SQL] change NullType to StringType during binaryComparison
between NullType and StringType
During executing PromoteStrings rule, if one side of binaryComparison is
StringType and the other side is not StringType, the current code will
promote(cast) the StringType to DoubleType, and if the StringType doesn't
contain the numbers, it will get null value. So if it is doing <=> (NULL-safe
equal) with Null, it will not filter anything, caused the problem reported by
this jira.
I proposal to the changes through this PR, can you review my code changes ?
This problem only happen for <=>, other operators works fine.
scala> val filteredDF = df.filter(df("column") > (new
Column(Literal(null))))
filteredDF: org.apache.spark.sql.DataFrame = [column: string]
scala> filteredDF.show
+------+
|column|
+------+
+------+
scala> val filteredDF = df.filter(df("column") === (new
Column(Literal(null))))
filteredDF: org.apache.spark.sql.DataFrame = [column: string]
scala> filteredDF.show
+------+
|column|
+------+
+------+
scala> df.registerTempTable("DF")
scala> sqlContext.sql("select * from DF where 'column' = NULL")
res27: org.apache.spark.sql.DataFrame = [column: string]
scala> res27.show
+------+
|column|
+------+
+------+
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/kevinyu98/spark working_on_spark-11447
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/9720.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #9720
----
commit b53b85cad4f5fced9ba003351d5a9af1eb5111fc
Author: Kevin Yu <[email protected]>
Date: 2015-11-13T18:11:59Z
[SPARK-11447]Check NullType before Promote StringType
commit bb705cae18032fcee8f8a532be464f0a995b27cb
Author: Kevin Yu <[email protected]>
Date: 2015-11-15T06:41:48Z
add testcase in ColumnExpressionSuite
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]