GitHub user maropu opened a pull request:
https://github.com/apache/spark/pull/17136
[SPARK-19783][SQL] Treat shorter/longer lengths of tokens as malformed
records in CSV parser
## What changes were proposed in this pull request?
If a length of tokens does not match an expected length in a schema, we
need to treat it as a malformed record. This pr modified code to handle these
records as malformed.
This is a TODO task:
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala#L239
## How was this patch tested?
Modified some existing tests and added new ones in `CSVSuite`.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/maropu/spark SPARK-19783
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/17136.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #17136
----
commit aa290ee32ef09d6d018f261c3bccb85d08259ac5
Author: Takeshi Yamamuro <[email protected]>
Date: 2017-03-01T09:58:56Z
Treat shorter/longer lengths of tokens as malformed records in CSV parser
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]