GitHub user MaxGekk opened a pull request:
https://github.com/apache/spark/pull/23120
[SPARK-26151][SQL] Return partial results for bad CSV records
## What changes were proposed in this pull request?
In the PR, I propose to change behaviour of `UnivocityParser` and
`FailureSafeParser`, and return all fields that were parsed and converted to
expected types successfully instead of just returning a row with all `null`s
for a bad input in the `PERMISSIVE` mode. For example, for CSV line
`0,2013-111-11 12:13:14` and DDL schema `a int, b timestamp`, new result is
`Row(0, null)`.
## How was this patch tested?
It was checked by existing tests from `CsvSuite` and `CsvFunctionsSuite`.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/MaxGekk/spark-1 failuresafe-partial-result
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/23120.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #23120
----
commit ddcfd90b8e6593594a5392cb19416c38902fb053
Author: Maxim Gekk <max.gekk@...>
Date: 2018-11-22T21:44:46Z
Return partial results
commit 8f2d69d848b8242c529118436249019016069ca2
Author: Maxim Gekk <max.gekk@...>
Date: 2018-11-22T21:44:56Z
Fix tests
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]