[GitHub] [spark] pralabhkumar commented on a diff in pull request #37009: [SPARK-38292][PYTHON]Support na_filter for pyspark.pandas.read_csv

GitBox Thu, 21 Jul 2022 09:32:39 -0700


pralabhkumar commented on code in PR #37009:
URL: https://github.com/apache/spark/pull/37009#discussion_r926888668



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala:
##########
@@ -317,7 +317,15 @@ class UnivocityParser(
         if (skipRow) {
           row.setNullAt(i)
         } else {
-          row(i) = valueConverters(i).apply(getToken(tokens, i))
+          // This is required to not set value as null ,
+          // 1. If the missing value at the end of line.
+          // 2. If the missing value at the beginning of line.
+          if (!options.naFilter && (i>=tokens.length ||
+            (i==0 && getToken(tokens, i).length == 0))) {
+            row(i) = valueConverters(i).apply("")

Review Comment:
   This is only called if the missing values in the beginning or end . In that 
case current code goes into exception block and update row with default values 
(which will replace missing values with null) . In case option.naFilter is 
False , we do not want to replace with null and let them be missing value



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] pralabhkumar commented on a diff in pull request #37009: [SPARK-38292][PYTHON]Support na_filter for pyspark.pandas.read_csv

Reply via email to