GitHub user MaxGekk opened a pull request:
https://github.com/apache/spark/pull/21273
[WIP][SPARK-17916][SQL] Fix empty string being parsed as null when
nullValue is set.
## What changes were proposed in this pull request?
I propose to bump version of uniVocity parser up to 2.6.3 where quoted
empty strings are replaced by the empty value (passed to `setEmptyValue`)
instead of `null` values as in the current version 2.5.9:
https://github.com/uniVocity/univocity-parsers/blob/v2.6.3/src/main/java/com/univocity/parsers/csv/CsvParser.java#L125
Empty value for writer is set to `""`. So, empty string in
dataframe/dataset is stored as empty quoted string `""`. Empty value for reader
is set to empty string (zero size). In this way, saved empty quoted string will
be read as just empty string. Please, look at the tests for more details.
## How was this patch tested?
Added tests from the PR https://github.com/apache/spark/pull/20068
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/MaxGekk/spark-1 univocity-2.6
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21273.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21273
----
commit 457a21aad16c05b84268e416259f7aa332b0fc43
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-05-08T21:23:56Z
Getting tests from the PR #20068
commit cd78b12595c6fcbb71819416e0dd515a6bc82d91
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-05-08T21:24:37Z
Bump versions of uniVocity parser to 2.6.3
commit 598ba2da9ee6b014713f3ad41fc382590dcc7b37
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-05-08T21:25:09Z
Set values of empty strings in read and in write
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]