[ 
https://issues.apache.org/jira/browse/SPARK-15125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk resolved SPARK-15125.
--------------------------------
       Resolution: Fixed
    Fix Version/s: 2.4.0

The issue has been fixed by 
https://github.com/apache/spark/commit/7a2d4895c75d4c232c377876b61c05a083eab3c8

> CSV data source recognizes empty quoted strings in the input as null. 
> ----------------------------------------------------------------------
>
>                 Key: SPARK-15125
>                 URL: https://issues.apache.org/jira/browse/SPARK-15125
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0
>            Reporter: Suresh Thalamati
>            Priority: Major
>             Fix For: 2.4.0
>
>
> CSV data source does not differentiate between empty quoted strings and empty 
> fields  as null. In some scenarios user would want  to differentiate between 
> these values,  especially in the context of SQL where NULL , and empty string 
> have different meanings  If input data happens to be dump from traditional 
> relational data source, users will see different results for the SQL queries. 
> {code}
> Repro:
> Test Data: (test.csv)
> year,make,model,comment,price
> 2017,Tesla,Mode 3,looks nice.,35000.99
> 2016,Chevy,Bolt,"",29000.00
> 2015,Porsche,"",,
> scala> val df= sqlContext.read.format("csv").option("header", 
> "true").option("inferSchema", "true").option("nullValue", 
> null).load("/tmp/test.csv")
> df: org.apache.spark.sql.DataFrame = [year: int, make: string ... 3 more 
> fields]
> scala> df.show
> +----+-------+------+-----------+--------+
> |year|   make| model|    comment|   price|
> +----+-------+------+-----------+--------+
> |2017|  Tesla|Mode 3|looks nice.|35000.99|
> |2016|  Chevy|  Bolt|       null| 29000.0|
> |2015|Porsche|  null|       null|    null|
> +----+-------+------+-----------+--------+
> Expected:
> +----+-------+------+-----------+--------+
> |year|   make| model|    comment|   price|
> +----+-------+------+-----------+--------+
> |2017|  Tesla|Mode 3|looks nice.|35000.99|
> |2016|  Chevy|  Bolt|           | 29000.0|
> |2015|Porsche|      |       null|    null|
> +----+-------+------+-----------+--------+
> {code}
> Testing a fix for the this issue. I will give a shot at submitting a PR for 
> this soon. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to