[GitHub] [spark] wayneguow commented on a change in pull request #34853: [SPARK-37575][SQL] null values should be saved as nothing rather than quoted empty Strings "" by default settings

GitBox Mon, 13 Dec 2021 19:52:22 -0800


wayneguow commented on a change in pull request #34853:
URL: https://github.com/apache/spark/pull/34853#discussion_r768296752




##########
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
##########
@@ -805,6 +805,22 @@ abstract class CSVSuite
     }
   }
 
+  test("SPARK-37575: null values should not reflect to any characters by 
default") {
+    val litNull: String = null
+    val data = Seq(("Tesla", litNull, ""))
+    withTempPath { path =>
+      val csvDir = new File(path, "csv")
+      val cars = data.toDF("make", "comment", "blank")
+      cars.coalesce(1).write.csv(csvDir.getCanonicalPath)
+
+      csvDir.listFiles().filter(_.getName.endsWith("csv")).foreach({ csvFile =>
+        val readBack = Files.readAllBytes(csvFile.toPath)
+        val expected = ("Tesla,,\"\"" + Properties.lineSeparator).getBytes()
+        assert(readBack === expected)
+      })
+    }

Review comment:
       Why I read the written with an original file reader is that I want to 
show other users what changes clearly in saved csv files after this PR.  Before 
and after this PR, null values are both read as null in DataFrame, maybe 
someone can not get this change clearly.
   
   But I think your advice is good, so I add another part to read back with 
Spark.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] wayneguow commented on a change in pull request #34853: [SPARK-37575][SQL] null values should be saved as nothing rather than quoted empty Strings "" by default settings

Reply via email to