[GitHub] [spark] MaxGekk commented on a change in pull request #34905: [SPARK-37575][SQL][FOLLOWUP] Update migration guide for null values saving in CSV data source

GitBox Tue, 14 Dec 2021 22:20:53 -0800


MaxGekk commented on a change in pull request #34905:
URL: https://github.com/apache/spark/pull/34905#discussion_r769283078




##########
File path: docs/sql-migration-guide.md
##########
@@ -52,6 +52,8 @@ license: |
 
   - Since Spark 3.3, the `strfmt` in `format_string(strfmt, obj, ...)` and 
`printf(strfmt, obj, ...)` will no longer support to use "0$" to specify the 
first argument, the first argument should always reference by "1$" when use 
argument index to indicating the position of the argument in the argument list.
 
+  - Since Spark 3.3, nulls are written as empty strings in CSV data source by 
default. In Spark 3.2 or earlier, nulls were written as empty strings as quoted 
empty strings, `""`. To restore the previous behavior, set `nullValue` to `""`.

Review comment:
       > To restore the previous behavior, set `nullValue` to `""`
   
   Actually, it is correct but if an user sets the option stupidly as 
recommended:
   ```scala
   scala> val df = Seq("abc", null, "def").toDF()
   df: org.apache.spark.sql.DataFrame = [value: string]
   
   scala> df.repartition(1).write.option("nullValue", 
"").mode("overwrite").csv("/Users/maximgekk/tmp/csv3")
   ``` 
   ```
   $ csv3 cat ./part-00000-5830ac7c-3653-41ec-a2f7-c56934ef56d9-c000.csv
   abc
   def
   ```
   but:
   ```
   scala> df.repartition(1).write.option("nullValue", 
"\"\"").mode("overwrite").csv("/Users/maximgekk/tmp/csv4")
   ```
   ```
   $ csv4 cat ./part-00000-6a5b0628-8924-4300-9699-89f4df903db9-c000.csv
   abc
   ""
   def
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] MaxGekk commented on a change in pull request #34905: [SPARK-37575][SQL][FOLLOWUP] Update migration guide for null values saving in CSV data source

Reply via email to