Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/20705
@gatorsmile and @HyukjinKwon .
Two failures are due to limitation of the current JSON data source
implementation. Here, we can see that the test suite correctly tests the target
data source.
1. `resolveRelation for a FileFormat DataSource without userSchema scan
filesystem only once`
For Json source, the statistic count becomes 2.
2. `Pre insert nullability check (MapType)`
Since Json source save as string, it raises ClassCastException when the
given user or table schema is different.
```scala
scala> (Tuple1(Map(1 -> (null: Integer))) ::
Nil).toDF("a").write.mode("overwrite").save("/tmp/json")
scala> spark.read.json("/tmp/json").printSchema
root
|-- a: struct (nullable = true)
| |-- 1: string (nullable = true)
scala> (Tuple1(Map(1 -> (null: Integer))) ::
Nil).toDF("a").write.mode("overwrite").saveAsTable("map")
18/03/02 21:13:49 WARN HiveExternalCatalog: Couldn't find corresponding
Hive SerDe for data source provider json. Persisting data source table
`default`.`map` into Hive metastore in Spark SQL specific format, which is NOT
compatible with Hive.
scala> spark.read.json("/tmp/json").printSchema
root
|-- a: struct (nullable = true)
| |-- 1: string (nullable = true)
scala> spark.table("map").printSchema
root
|-- a: map (nullable = true)
| |-- key: integer
| |-- value: integer (valueContainsNull = true)
scala> spark.table("map").show
18/03/02 21:14:12 ERROR Executor: Exception in task 0.0 in stage 10.0 (TID
10)
java.lang.ClassCastException: org.apache.spark.unsafe.types.UTF8String
cannot be cast to java.lang.Integer
at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:101)
```
For JSON format, could you confirm this, @HyukjinKwon ?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]