LuciferYang opened a new pull request #30518:
URL: https://github.com/apache/spark/pull/30518
### What changes were proposed in this pull request?
There are some differences between Spark CSV, opencsv and commons-csv, the
typical case are described in SPARK-33566, When there are both unescaped quotes
and unescaped qualifier in value, the results of parsing are different.
The reason for the difference is Spark use `STOP_AT_DELIMITER` as default
`UnescapedQuoteHandling` to build `CsvParser` and it not configurable.
On the other hand, opencsv and commons-csv use the parsing mechanism similar
to `STOP_AT_CLOSING_QUOTE ` by default.
So this pr make `unescapedQuoteHandling` option configurable to get the same
parsing result as opencsv and commons-csv.
### Why are the changes needed?
Make unescapedQuoteHandling option configurable when read CSV to make
parsing more flexible。
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
- Pass the Jenkins or GitHub Action
- Add a new case similar to that described in SPARK-33566
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]