[GitHub] spark pull request: [SPARK-15226][SQL]fix CSV file data-line with ...

HyukjinKwon Mon, 09 May 2016 18:58:09 -0700

Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/13007#issuecomment-218039838
  
    @WeichenXu123 [External CSV data 
source](https://github.com/databricks/spark-csv) supports this but has an issue 
for parsing unescaped quotes, here, 
https://issues.apache.org/jira/browse/SPARK-14103.
    
    In this JIRA, I introduced the usage of `UnescapedQuoteHandling` to deal 
with the problem. So, if we need to support the original behaviour like the 
external CSV data source, we need an option to deal with the unescaped quotes.
    
    Personally, I think we should not allow CSV parsing across multiple lines. 
CSV data source currently uses `TextInputFormat` which reads the data line by 
line. So, a record (across multiple lines) would mean a record across multiple 
HDFS blocks, which will end up with failing to read correctly. 
    
    So, I think we should not support this feature for now until we have a 
clear solution.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-15226][SQL]fix CSV file data-line with ...

Reply via email to