dongjoon-hyun commented on a change in pull request #27854: [SPARK-31065][SQL]
Match schema_of_json to the schema inference of JSON data source
URL: https://github.com/apache/spark/pull/27854#discussion_r389849211
##########
File path:
sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala
##########
@@ -674,4 +674,40 @@ class JsonFunctionsSuite extends QueryTest with
SharedSparkSession {
spark.range(1).select(schema_of_json(input)),
Seq(Row("struct<id:bigint,price:double>")))
}
+
+ test("SPARK-31065: schema_of_json - null and empty strings as strings") {
+ Seq("""{"id": null}""", """{"id": ""}""").foreach { input =>
+ checkAnswer(
+ spark.range(1).select(schema_of_json(input)),
+ Seq(Row("struct<id:string>")))
Review comment:
In these days, the behavior change becomes a tricky issue although it's a
bug fix. I'm wondering if this PR is safe (`not a silent behavior change`) in
this case because Apache Spark 2.4.x ~ 3.0.0-preview2 returns like the
following as mentioned in the PR description. To make it sure, ping
@gatorsmile , @cloud-fan , @marmbrus , @srowen .
```scala
scala> spark.range(1).select(schema_of_json("""{"id": null}""")).show
+----------------------------+
|schema_of_json({"id": null})|
+----------------------------+
| struct<id:null>|
+----------------------------+
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]