[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37009: [SPARK-38292][PYTHON]Support na_filter for pyspark.pandas.read_csv

GitBox Tue, 19 Jul 2022 19:03:25 -0700


HyukjinKwon commented on code in PR #37009:
URL: https://github.com/apache/spark/pull/37009#discussion_r925094888



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala:
##########
@@ -317,7 +317,15 @@ class UnivocityParser(
         if (skipRow) {
           row.setNullAt(i)
         } else {
-          row(i) = valueConverters(i).apply(getToken(tokens, i))
+          // This is required to not set value as null ,
+          // 1. If the missing value at the end of line.
+          // 2. If the missing value at the beginning of line.
+          if (!options.naFilter && (i>=tokens.length ||
+            (i==0 && getToken(tokens, i).length == 0))) {
+            row(i) = valueConverters(i).apply("")

Review Comment:
   For `""`, it's interpreted differently with `nullValue` or `emptyValue`. I 
don't think we should rely on this conversion here. Can we control this by 
setting one of these options?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37009: [SPARK-38292][PYTHON]Support na_filter for pyspark.pandas.read_csv

Reply via email to