panbingkun commented on PR #48653:
URL: https://github.com/apache/spark/pull/48653#issuecomment-2465929498
> @panbingkun There are CSV options:
>
> * ignoreLeadingWhiteSpace
> * ignoreTrailingWhiteSpace
>
> They are off in read by default, but when you set them on, do they solve
your issue?
- Taking the following as an example:
```sql
# parse error
spark-sql (default)> select from_csv('1, 1', 'a INT, b INT',
map('ignoreLeadingWhiteSpace', 'false'));
from_csv(1, 1)
{"a":1,"b":null}
Time taken: 0.099 seconds, Fetched 1 row(s)
# parse correctly
spark-sql (default)> select from_csv('1, 1', 'a INT, b INT',
map('ignoreLeadingWhiteSpace', 'true'));
from_csv(1, 1)
{"a":1,"b":1}
Time taken: 0.037 seconds, Fetched 1 row(s)
```
```sql
# parse correctly
spark-sql (default)> select from_csv('1, 1', 'a INT, b DOUBLE',
map('ignoreLeadingWhiteSpace', 'false'));
from_csv(1, 1)
{"a":1,"b":1.0}
Time taken: 0.065 seconds, Fetched 1 row(s)
# parse correctly
spark-sql (default)> select from_csv('1, 1', 'a INT, b DOUBLE',
map('ignoreLeadingWhiteSpace', 'true'));
from_csv(1, 1)
{"a":1,"b":1.0}
Time taken: 0.035 seconds, Fetched 1 row(s)
```
- Yes, For the same data `1, 1`, when the schema is `a INT, b INT`, setting
`gnoreLeadingWhiteSpace = true` can be bypassed to obtain the correct parsing.
However, when the schema is 'a INT, b DOUBLE', there is no need to set it, and
the correct parsing can be obtained, this behavior looks very `weird`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]