[GitHub] [spark] srowen opened a new pull request #29516: [SPARK-32614][SQL] Don't apply comment processing if 'comment' unset for CSV

GitBox Sat, 22 Aug 2020 15:08:42 -0700


srowen opened a new pull request #29516:
URL: https://github.com/apache/spark/pull/29516



   
   ### What changes were proposed in this pull request?
   
   Spark's CSV source can optionally ignore lines starting with a comment char. 
Some code paths check to see if it's set before applying comment logic (i.e. 
not set to default of `\0`), but many do not, including the one that passes the 
option to Univocity. This means that rows beginning with a null char were being 
treated as comments even when 'disabled'.
   
   ### Why are the changes needed?
   
   To avoid dropping rows that start with a null char when this is not 
requested or intended. See JIRA for an example.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Nothing beyond the effect of the bug fix.
   
   ### How was this patch tested?
   
   Existing tests plus new test case.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] srowen opened a new pull request #29516: [SPARK-32614][SQL] Don't apply comment processing if 'comment' unset for CSV

Reply via email to