[GitHub] spark pull request #23120: [SPARK-26151][SQL] Return partial results for bad...

MaxGekk Thu, 22 Nov 2018 13:57:02 -0800

GitHub user MaxGekk opened a pull request:

    https://github.com/apache/spark/pull/23120


    [SPARK-26151][SQL] Return partial results for bad CSV records

    ## What changes were proposed in this pull request?
    
    In the PR, I propose to change behaviour of `UnivocityParser` and 
`FailureSafeParser`, and return all fields that were parsed and converted to 
expected types successfully instead of just returning a row with all `null`s 
for a bad input in the `PERMISSIVE` mode. For example, for CSV line 
`0,2013-111-11 12:13:14` and DDL schema `a int, b timestamp`, new result is 
`Row(0, null)`.
    
    ## How was this patch tested?
    
    It was checked by existing tests from `CsvSuite` and `CsvFunctionsSuite`.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/MaxGekk/spark-1 failuresafe-partial-result

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/23120.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #23120
    
----
commit ddcfd90b8e6593594a5392cb19416c38902fb053
Author: Maxim Gekk <max.gekk@...>
Date:   2018-11-22T21:44:46Z

    Return partial results

commit 8f2d69d848b8242c529118436249019016069ca2
Author: Maxim Gekk <max.gekk@...>
Date:   2018-11-22T21:44:56Z

    Fix tests

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #23120: [SPARK-26151][SQL] Return partial results for bad...

Reply via email to