[GitHub] [spark] MaxGekk commented on a change in pull request #34853: [SPARK-37575][SQL] null values should be saved as nothing rather than quoted empty Strings "" by default settings

GitBox Mon, 13 Dec 2021 21:42:22 -0800


MaxGekk commented on a change in pull request #34853:
URL: https://github.com/apache/spark/pull/34853#discussion_r768332310




##########
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
##########
@@ -805,6 +805,22 @@ abstract class CSVSuite
     }
   }
 
+  test("SPARK-37575: null values should not reflect to any characters by 
default") {
+    val litNull: String = null
+    val data = Seq(("Tesla", litNull, ""))
+    withTempPath { path =>
+      val csvDir = new File(path, "csv")
+      val cars = data.toDF("make", "comment", "blank")
+      cars.coalesce(1).write.csv(csvDir.getCanonicalPath)
+
+      csvDir.listFiles().filter(_.getName.endsWith("csv")).foreach({ csvFile =>
+        val readBack = Files.readAllBytes(csvFile.toPath)
+        val expected = ("Tesla,,\"\"" + Properties.lineSeparator).getBytes()
+        assert(readBack === expected)
+      })
+    }

Review comment:
       > I want to show other users what changes clearly in saved csv files 
after this PR.
   
   You will show the same while reading by Spark's **text** datasource. It just 
splits input by lines. Even the splitting can be avoid using the `wholetext` 
option.
   
   BTW, in the code `+ Properties.lineSeparator`, you made an assumption that 
Spark/uniVocity uses the separator in write.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] MaxGekk commented on a change in pull request #34853: [SPARK-37575][SQL] null values should be saved as nothing rather than quoted empty Strings "" by default settings

Reply via email to