GitHub user MaxGekk opened a pull request:
https://github.com/apache/spark/pull/22374
[SPARK-25387][SQL] Fix for NPE caused by bad CSV input
## What changes were proposed in this pull request?
The PR fixes NPE in `UnivocityParser` caused by malformed CSV input. In
some cases, `uniVocity` parser can return `null` for bad input. In the PR, I
propose to check result of parsing and not propagate NPE to upper layers.
## How was this patch tested?
I added a test which reproduce the issue and tested by `CSVSuite`.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/MaxGekk/spark-1 npe-on-bad-csv
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22374.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22374
----
commit 6f9aba58927525f48bfdca00f76bf4890fe4cf30
Author: Maxim Gekk <max.gekk@...>
Date: 2018-09-09T15:20:54Z
Fix NPE in read with specified schema
commit 9284527d1c473facd589ea8195c017c37d076df5
Author: Maxim Gekk <max.gekk@...>
Date: 2018-09-09T15:33:41Z
Fix NPE in read on schema inferring
commit 05fe5faf191209366a37d5531e68b51364993dee
Author: Maxim Gekk <max.gekk@...>
Date: 2018-09-09T15:39:32Z
Checking multiLine mode
commit b20c12d7720aeba4c4e03f2a6c18ef076d5b894a
Author: Maxim Gekk <max.gekk@...>
Date: 2018-09-09T15:40:51Z
Adding ticket number to test's title
commit c9ccbee5e15bfa4ee67e12b256eefa544ce01f74
Author: Maxim Gekk <max.gekk@...>
Date: 2018-09-09T15:44:31Z
Fix imports
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]