HyukjinKwon commented on a change in pull request #27854: [SPARK-31065][SQL] 
Match schema_of_json to the schema inference of JSON data source
URL: https://github.com/apache/spark/pull/27854#discussion_r389560513
 
 

 ##########
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ##########
 @@ -777,7 +777,18 @@ case class SchemaOfJson(
   override def eval(v: InternalRow): Any = {
     val dt = Utils.tryWithResource(CreateJacksonParser.utf8String(jsonFactory, 
json)) { parser =>
       parser.nextToken()
-      jsonInferSchema.inferField(parser)
+      // To match with schema inference from JSON datasource.
+      jsonInferSchema.inferField(parser) match {
+        case st: StructType =>
+          jsonInferSchema.canonicalizeType(st, 
jsonOptions).getOrElse(StructType(Nil))
+        case at: ArrayType if at.elementType.isInstanceOf[StructType] =>
+          jsonInferSchema
+            .canonicalizeType(at.elementType, jsonOptions)
+            .map(ArrayType(_, containsNull = at.containsNull))
+            .getOrElse(ArrayType(StructType(Nil), containsNull = 
at.containsNull))
+        case other: DataType =>
+          jsonInferSchema.canonicalizeType(other, 
jsonOptions).getOrElse(StringType)
+      }
 
 Review comment:
   Here is about the actual fix.
   
   The reason why there are differences compared to JSON datasource is:
   
   1. JSON datasource always expects `StructType`. Array of JSON objects is 
still inferred as `StructType`. Other types are disallowed as a root type.
   2. `schema_of_json` infers the type as is. Array of JSON objects is inferred 
as `ArrayType(StructType)`. Other types are allowed as a root type.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to