[GitHub] spark pull request #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPru...

maropu Sun, 24 Jun 2018 21:39:01 -0700

GitHub user maropu opened a pull request:

    https://github.com/apache/spark/pull/21631


    [SPARK-24645][SQL] Skip parsing when csvColumnPruning enabled and 
partitions scanned only

    ## What changes were proposed in this pull request?
    In the master, when `csvColumnPruning` enabled and partitions scanned only, 
it throws an exception below;
    
    ```
    scala> val dir = "/tmp/spark-csv/csv"
    scala> spark.range(10).selectExpr("id % 2 AS p", 
"id").write.mode("overwrite").partitionBy("p").csv(dir)
    scala> spark.read.csv(dir).selectExpr("sum(p)").collect()
    18/06/25 13:12:51 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 5)
    java.lang.NullPointerException
            at 
org.apache.spark.sql.execution.datasources.csv.UnivocityParser.org$apache$spark$sql$execution$datasources$csv$UnivocityParser$$convert(UnivocityParser.scala:197)
  
            at 
org.apache.spark.sql.execution.datasources.csv.UnivocityParser.parse(UnivocityParser.scala:190)
            at 
org.apache.spark.sql.execution.datasources.csv.UnivocityParser$$anonfun$5.apply(UnivocityParser.scala:309)
            at 
org.apache.spark.sql.execution.datasources.csv.UnivocityParser$$anonfun$5.apply(UnivocityParser.scala:309)
            at 
org.apache.spark.sql.execution.datasources.FailureSafeParser.parse(FailureSafeParser.scala:61)
            ...
    ```
    This pr modified code to skip CSV parsing in the case.
    
    ## How was this patch tested?
    Added tests in `CSVSuite`.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/maropu/spark SPARK-24645

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21631.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21631
    
----
commit 59a7f142ae9c83c76c2bfbff2962c071fc586122
Author: Takeshi Yamamuro <yamamuro@...>
Date:   2018-06-25T04:18:37Z

    fix

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPru...

Reply via email to