Re: [PR] [SPARK-50110][SQL] Fix `from_csv`: parse fails when data contains spaces before and after [spark]

via GitHub Fri, 08 Nov 2024 16:16:56 -0800


panbingkun commented on PR #48653:
URL: https://github.com/apache/spark/pull/48653#issuecomment-2465929498


   > @panbingkun There are CSV options:
   > 
   > * ignoreLeadingWhiteSpace
   > * ignoreTrailingWhiteSpace
   > 
   > They are off in read by default, but when you set them on, do they solve 
your issue?
   
   - Taking the following as an example:
   ```sql
   # parse error
   spark-sql (default)> select from_csv('1,  1', 'a INT, b INT', 
map('ignoreLeadingWhiteSpace', 'false'));
   from_csv(1,  1)
   {"a":1,"b":null}
   Time taken: 0.099 seconds, Fetched 1 row(s)
   
   # parse correctly
   spark-sql (default)> select from_csv('1,  1', 'a INT, b INT', 
map('ignoreLeadingWhiteSpace', 'true'));
   from_csv(1,  1)
   {"a":1,"b":1}
   Time taken: 0.037 seconds, Fetched 1 row(s)
   ```
   
   ```sql
   # parse correctly
   spark-sql (default)> select from_csv('1,  1', 'a INT, b DOUBLE', 
map('ignoreLeadingWhiteSpace', 'false'));
   from_csv(1,  1)
   {"a":1,"b":1.0}
   Time taken: 0.065 seconds, Fetched 1 row(s)
   
   # parse correctly
   spark-sql (default)> select from_csv('1,  1', 'a INT, b DOUBLE', 
map('ignoreLeadingWhiteSpace', 'true'));
   from_csv(1,  1)
   {"a":1,"b":1.0}
   Time taken: 0.035 seconds, Fetched 1 row(s)
   ```
   
   - Yes, For the same data `1,  1`, when the schema is `a INT, b INT`, setting 
`gnoreLeadingWhiteSpace = true` can be bypassed to obtain the correct parsing. 
However, when the schema is 'a INT, b DOUBLE', there is no need to set it, and 
the correct parsing can be obtained, this behavior looks very `weird`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-50110][SQL] Fix `from_csv`: parse fails when data contains spaces before and after [spark]

Reply via email to