[ 
https://issues.apache.org/jira/browse/SPARK-18269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-18269.
---------------------------------
       Resolution: Fixed
         Assignee: Hyukjin Kwon
    Fix Version/s: 2.1.0

> NumberFormatException when reading csv for a nullable column
> ------------------------------------------------------------
>
>                 Key: SPARK-18269
>                 URL: https://issues.apache.org/jira/browse/SPARK-18269
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.1
>            Reporter: Jork Zijlstra
>            Assignee: Hyukjin Kwon
>             Fix For: 2.1.0
>
>
> Having a schema with a nullable column thrown an 
> java.lang.NumberFormatException: null when the data + delimeter isn't 
> specified in the csv.
> Specifying the schema:
> {code}
> StructType(Array(
>   StructField("id", IntegerType, nullable = false),
>   StructField("underlyingId", IntegerType, true)
> ))
> {code}
> Data (without trailing delimeter to specify the second column):
> {code}
> 1
> {code}
> Read the data:
> {code}
> sparkSession.read
>     .schema(sourceSchema)
>     .option("header", "false")
>     .option("delimiter", """\t""")
>     .csv(files(dates): _*)
>     .rdd
> {code}
> Actual Result: 
> {code}
> java.lang.NumberFormatException: null
>       at java.lang.Integer.parseInt(Integer.java:542)
>       at java.lang.Integer.parseInt(Integer.java:615)
>       at 
> scala.collection.immutable.StringLike$class.toInt(StringLike.scala:272)
>       at scala.collection.immutable.StringOps.toInt(StringOps.scala:29)
>       at 
> org.apache.spark.sql.execution.datasources.csv.CSVTypeCast$.castTo(CSVInferSchema.scala:244)
> {code}
> Reason:
> The csv line is parsed into a Map (indexSafeTokens), which is short of one 
> value. So indexSafeTokens(index) throws a NullpointerException reading the 
> optional value which isn't in the Map.
> The NullpointerException is then given to the CSVTypeCast.castTo(datum: 
> String, .....) as the datum value.
> The subsequent NumberFormatException is thrown due to the fact that a 
> NullpointerException cannot be cast into the Type.
> Possible fix:
> - Use the provided schema to parse the line with the correct number of columns
> - Since its nullable implement a try catch on CSVRelation.csvParser 
> indexSafeTokens(index)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to