Github user HyukjinKwon commented on the pull request:
https://github.com/apache/spark/pull/13007#issuecomment-218039838
@WeichenXu123 [External CSV data
source](https://github.com/databricks/spark-csv) supports this but has an issue
for parsing unescaped quotes, here,
https://issues.apache.org/jira/browse/SPARK-14103.
In this JIRA, I introduced the usage of `UnescapedQuoteHandling` to deal
with the problem. So, if we need to support the original behaviour like the
external CSV data source, we need an option to deal with the unescaped quotes.
Personally, I think we should not allow CSV parsing across multiple lines.
CSV data source currently uses `TextInputFormat` which reads the data line by
line. So, a record (across multiple lines) would mean a record across multiple
HDFS blocks, which will end up with failing to read correctly.
So, I think we should not support this feature for now until we have a
clear solution.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]