dongjoon-hyun commented on a change in pull request #27854: [SPARK-31065][SQL] 
Match schema_of_json to the schema inference of JSON data source
URL: https://github.com/apache/spark/pull/27854#discussion_r389849211
 
 

 ##########
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala
 ##########
 @@ -674,4 +674,40 @@ class JsonFunctionsSuite extends QueryTest with 
SharedSparkSession {
       spark.range(1).select(schema_of_json(input)),
       Seq(Row("struct<id:bigint,price:double>")))
   }
+
+  test("SPARK-31065: schema_of_json - null and empty strings as strings") {
+    Seq("""{"id": null}""", """{"id": ""}""").foreach { input =>
+      checkAnswer(
+        spark.range(1).select(schema_of_json(input)),
+        Seq(Row("struct<id:string>")))
 
 Review comment:
   In these days, the behavior change becomes a tricky issue although it's a 
bug fix. I'm wondering if this PR is safe (`not a silent behavior change`) in 
this case because Apache Spark 2.4.x ~ 3.0.0-preview2 returns like the 
following as mentioned in the PR description. To make it sure,  ping 
@gatorsmile , @cloud-fan , @marmbrus , @srowen .
   ```scala
   scala> spark.range(1).select(schema_of_json("""{"id": null}""")).show
   +----------------------------+
   |schema_of_json({"id": null})|
   +----------------------------+
   |             struct<id:null>|
   +----------------------------+
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to