[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...

maropu Sun, 24 Jun 2018 23:24:01 -0700

Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/21631
  
    As I described in 
https://github.com/apache/spark/pull/21625#discussion_r197679077, I found 
another bug? (the case where 
`spark.sql.csv.parser.columnPruning.enabled=false`) when working on this pr;
    ```
    ./bin/spark-shell --conf spark.sql.csv.parser.columnPruning.enabled=false
    scala> val dir = "/tmp/spark-csv/csv"
    scala> spark.range(10).selectExpr("id % 2 AS p", 
"id").write.mode("overwrite").partitionBy("p").csv(dir)
    scala> spark.read.csv(dir).selectExpr("sum(p)").collect()
    18/06/25 13:48:46 ERROR Executor: Exception in task 2.0 in stage 2.0 (TID 7)
    java.lang.ClassCastException: org.apache.spark.unsafe.types.UTF8String 
cannot be cast to java.lang.Integer
            at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:101)
            at 
org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getInt(rows.scala:41)
            ...
    ```
    I worked on this fix and made a patch to fix this; 
https://github.com/apache/spark/compare/master...maropu:SPARK-24645-2



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...

Reply via email to