HyukjinKwon commented on code in PR #37009:
URL: https://github.com/apache/spark/pull/37009#discussion_r925094888
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala:
##########
@@ -317,7 +317,15 @@ class UnivocityParser(
if (skipRow) {
row.setNullAt(i)
} else {
- row(i) = valueConverters(i).apply(getToken(tokens, i))
+ // This is required to not set value as null ,
+ // 1. If the missing value at the end of line.
+ // 2. If the missing value at the beginning of line.
+ if (!options.naFilter && (i>=tokens.length ||
+ (i==0 && getToken(tokens, i).length == 0))) {
+ row(i) = valueConverters(i).apply("")
Review Comment:
For `""`, it's interpreted differently with `nullValue` or `emptyValue`. I
don't think we should rely on this conversion here. Can we control this by
setting one of these options?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]