GitHub user sureshthalamati opened a pull request:
https://github.com/apache/spark/pull/12904
[SPARK-15125][SQL] New option to the CSV data source to allows users to
specify to how to interpret empty quoted strings.
## What changes were proposed in this pull request?
This patch adds new boolean option emptyAsNull to the CSV data source for
user to specify whether empty quoted strings should be interpreted as null or
as an empty string. Default is to interpret as null to match the current
behavior.
Example:
input data :
year,make,model,comment,price
2016,Chevy,Bolt,"",29000.00
2015,Porsche,"",,
emptyAsNull = true (default) (current behaviour)
scala> val df= sqlContext.read.format("csv").option("header",
"true").option("inferSchema", "true").option("nullValue",
null).load("/tmp/test.csv")
scala> df.filter("model is null").show
+----+-------+-----+-------+-----+
|year| make|model|comment|price|
+----+-------+-----+-------+-----+
|2015|Porsche| null| null| null|
+----+-------+-----+-------+-----+
val df= sqlContext.read.format("csv").option("header",
"true").option("inferSchema", "true").option("nullValue",
null).option("emptyAsNull", "false").load("/tmp/test.csv")
scala> df.filter("model is null").show
+----+----+-----+-------+-----+
|year|make|model|comment|price|
+----+----+-----+-------+-----+
+----+----+-----+-------+-----+
## How was this patch tested?
Added new unit tests to the CSVSuite.
@falaki @rxin
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/sureshthalamati/spark
empstring_fix_spark-15125
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/12904.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #12904
----
commit e6207e73e547d4ff1b564ec2ffc8a10cd7c00b02
Author: sureshthalamati <[email protected]>
Date: 2016-05-04T18:35:59Z
This patch adds boolean option emptyAsNull to CSV datasource for user to
specify to interpret empty quoted strings as null or an empty string.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]