[GitHub] spark issue #20705: [SPARK-23553][TESTS] Tests should not assume the default...

dongjoon-hyun Fri, 02 Mar 2018 21:17:47 -0800

Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/20705
  
    @gatorsmile and @HyukjinKwon .
    Two failures are due to limitation of the current JSON data source 
implementation. Here, we can see that the test suite correctly tests the target 
data source.
    
    1. `resolveRelation for a FileFormat DataSource without userSchema scan 
filesystem only once`
      For Json source, the statistic count becomes 2.
    2. `Pre insert nullability check (MapType)`
      Since Json source save as string, it raises ClassCastException when the 
given user or table schema is different.
    ```scala
    scala> (Tuple1(Map(1 -> (null: Integer))) :: 
Nil).toDF("a").write.mode("overwrite").save("/tmp/json")
    
    scala> spark.read.json("/tmp/json").printSchema
    root
     |-- a: struct (nullable = true)
     |    |-- 1: string (nullable = true)
    
    scala> (Tuple1(Map(1 -> (null: Integer))) :: 
Nil).toDF("a").write.mode("overwrite").saveAsTable("map")
    18/03/02 21:13:49 WARN HiveExternalCatalog: Couldn't find corresponding 
Hive SerDe for data source provider json. Persisting data source table 
`default`.`map` into Hive metastore in Spark SQL specific format, which is NOT 
compatible with Hive.
    
    scala> spark.read.json("/tmp/json").printSchema
    root
     |-- a: struct (nullable = true)
     |    |-- 1: string (nullable = true)
    
    scala> spark.table("map").printSchema
    root
     |-- a: map (nullable = true)
     |    |-- key: integer
     |    |-- value: integer (valueContainsNull = true)
    
    scala> spark.table("map").show
    18/03/02 21:14:12 ERROR Executor: Exception in task 0.0 in stage 10.0 (TID 
10)
    java.lang.ClassCastException: org.apache.spark.unsafe.types.UTF8String 
cannot be cast to java.lang.Integer
            at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:101)
    ```
    
    For JSON format, could you confirm this, @HyukjinKwon ?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20705: [SPARK-23553][TESTS] Tests should not assume the default...

Reply via email to