[GitHub] [spark] LuciferYang opened a new pull request #30518: [SPARK-33566][SQL] Make unescapedQuoteHandling option configurable when read CSV

GitBox Thu, 26 Nov 2020 06:50:34 -0800


LuciferYang opened a new pull request #30518:
URL: https://github.com/apache/spark/pull/30518



   ### What changes were proposed in this pull request?
   There are some differences between Spark CSV, opencsv and commons-csv, the 
typical case are described in SPARK-33566, When there are both unescaped quotes 
and unescaped qualifier in value,  the results of parsing are different. 
   
   The reason for the difference is Spark use `STOP_AT_DELIMITER` as default 
`UnescapedQuoteHandling` to build `CsvParser` and it not configurable.  
   
   On the other hand, opencsv and commons-csv use the parsing mechanism similar 
to `STOP_AT_CLOSING_QUOTE ` by default.
   
   So this pr make `unescapedQuoteHandling` option configurable to get the same 
parsing result as opencsv and commons-csv.
    
   ### Why are the changes needed?
   Make unescapedQuoteHandling option configurable when read CSV to make 
parsing more flexible。
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   
   - Pass the Jenkins or GitHub Action
   
   - Add a new case similar to that described in SPARK-33566
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] LuciferYang opened a new pull request #30518: [SPARK-33566][SQL] Make unescapedQuoteHandling option configurable when read CSV

Reply via email to