Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/20949#discussion_r197643948
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
---
@@ -512,6 +512,43 @@ class CSVSuite extends QueryTest with SharedSQLContext
with SQLTestUtils with Te
}
}
+ test("Save csv with custom charset") {
+ Seq("iso-8859-1", "utf-8", "windows-1250").foreach { encoding =>
--- End diff --
Could you check the `UTF-16` and `UTF-32` encoding too. The written csv
files must contain [BOMs](https://en.wikipedia.org/wiki/Byte_order_mark) for
such encodings. I am not sure that Spark CSV datasource is able to read it in
per-line mode (`multiLine` is set to `false`). Probably, you need to switch to
multLine mode or read the files by Scala's library like in JsonSuite:
https://github.com/apache/spark/blob/c7e2742f9bce2fcb7c717df80761939272beff54/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala#L2322-L2338
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]