GitHub user kubatyszko opened a pull request:
https://github.com/apache/spark/pull/16316
Branch 2.0
## What changes were proposed in this pull request?
CSV parser changes allowing parsing of numeric fields to fail and return
null in such case.
In conjunction with "nullValue" option that may be used elsewhere this
allows handling of certain csv sources that may use empty string as indication
of null in one column and another specific value indicating null in another.
Currently the option "nullValue" can only be provided once and we can't
assume that a data source won't have a single "null" indicator.
## How was this patch tested?
The patch was tested using freshly compiled spark version 2.0.1 on a sample
data source that has "null" values in 2 columns, one specified as "NA" and set
using nullValue and another column with "" indicating no integer value.
Please review http://spark.apache.org/contributing.html before opening a
pull request.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/Katlean/spark branch-2.0
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/16316.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #16316
----
commit eb83747bbf426f8dc5c4f81f42ad3fecc78619af
Author: Sargis Dudaklyan <[email protected]>
Date: 2016-12-16T19:45:53Z
returns null when field is empty string on interger, long, and short column
types
commit 37b3b8c7d8eb8b43351ec2bd4c1c0432e046dc34
Author: Kuba Tyszko <[email protected]>
Date: 2016-12-16T20:29:45Z
Merge commit 'eb83747bbf426f8dc5c4f81f42ad3fecc78619af' into branch-2.0
The commit ensures that parsing CSV can default to NULL for Int,Short and
Long types.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]