srowen commented on a change in pull request #27854: [SPARK-31065][SQL] Match
schema_of_json to the schema inference of JSON data source
URL: https://github.com/apache/spark/pull/27854#discussion_r389852012
##########
File path:
sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala
##########
@@ -674,4 +674,40 @@ class JsonFunctionsSuite extends QueryTest with
SharedSparkSession {
spark.range(1).select(schema_of_json(input)),
Seq(Row("struct<id:bigint,price:double>")))
}
+
+ test("SPARK-31065: schema_of_json - null and empty strings as strings") {
+ Seq("""{"id": null}""", """{"id": ""}""").foreach { input =>
+ checkAnswer(
+ spark.range(1).select(schema_of_json(input)),
+ Seq(Row("struct<id:string>")))
Review comment:
From what I read here, the OP issue looks like a clean bug fix. Yes a
behavior change but bug fixes are. The change you highlight here is subtler,
yes. I think it's reasonable to infer string type rather than null type, but
would only do it at 3.0, not Spark 2.x. It would be a release notes item.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]