GitHub user MaxGekk opened a pull request:

    https://github.com/apache/spark/pull/21273

    [WIP][SPARK-17916][SQL] Fix empty string being parsed as null when 
nullValue is set.

    ## What changes were proposed in this pull request?
    
    I propose to bump version of uniVocity parser up to 2.6.3 where quoted 
empty strings are replaced by the empty value (passed to `setEmptyValue`) 
instead of `null` values as in the current version 2.5.9:
    
https://github.com/uniVocity/univocity-parsers/blob/v2.6.3/src/main/java/com/univocity/parsers/csv/CsvParser.java#L125
    
    Empty value for writer is set to `""`. So, empty string in 
dataframe/dataset is stored as empty quoted string `""`. Empty value for reader 
is set to empty string (zero size). In this way, saved empty quoted string will 
be read as just empty string. Please, look at the tests for more details.
    
    ## How was this patch tested?
    
    Added tests from the PR https://github.com/apache/spark/pull/20068


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/MaxGekk/spark-1 univocity-2.6

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21273.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21273
    
----
commit 457a21aad16c05b84268e416259f7aa332b0fc43
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-05-08T21:23:56Z

    Getting tests from the PR #20068

commit cd78b12595c6fcbb71819416e0dd515a6bc82d91
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-05-08T21:24:37Z

    Bump versions of uniVocity parser to 2.6.3

commit 598ba2da9ee6b014713f3ad41fc382590dcc7b37
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-05-08T21:25:09Z

    Set values of empty strings in read and in write

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to