HyukjinKwon commented on a change in pull request #27854: [SPARK-31065][SQL] 
Match schema_of_json to the schema inference of JSON data source
URL: https://github.com/apache/spark/pull/27854#discussion_r390060767
 
 

 ##########
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala
 ##########
 @@ -674,4 +674,40 @@ class JsonFunctionsSuite extends QueryTest with 
SharedSparkSession {
       spark.range(1).select(schema_of_json(input)),
       Seq(Row("struct<id:bigint,price:double>")))
   }
+
+  test("SPARK-31065: schema_of_json - null and empty strings as strings") {
+    Seq("""{"id": null}""", """{"id": ""}""").foreach { input =>
+      checkAnswer(
+        spark.range(1).select(schema_of_json(input)),
+        Seq(Row("struct<id:string>")))
 
 Review comment:
   @dongjoon-hyun, the problem here is that `struct<id:null>` can't be used as 
a schema. The main purpose of `schema_of_json` is an alternative for schema 
inference for `from_json`.
   
   Currently, the codes you mentioned don't work with it:
   
   ```scala
   val schemaExpr = schema_of_json(lit("""{"id": ""}"""))
   spark.range(1).select(from_json(lit("""{"id": ""}"""), schemaExpr)).show()
   ```
   
   **Before:**
   
   ```
   org.apache.spark.sql.catalyst.parser.ParseException:
   ...
   == SQL ==
   struct<id:null>
   ------^^^
   ```
   
   **After:**
   
   ```
   +---------------------+
   |from_json({"id": ""})|
   +---------------------+
   |                   []|
   +---------------------+
   ```
   
   `struct<id:null>` can't be used anywhere as we know `null` isn't supported 
as DDL formatted string.
   
   It's unlikely users depend on this behaviour so I personally don't think 
it's worthwhile to add a configuration and I would even doubt about the release 
note if this fix only lands to Spark 3.0.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to