[GitHub] spark pull request #20949: [SPARK-19018][SQL] Add support for custom encodin...

MaxGekk Sun, 24 Jun 2018 08:42:12 -0700

Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20949#discussion_r197643948
  
    --- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
 ---
    @@ -512,6 +512,43 @@ class CSVSuite extends QueryTest with SharedSQLContext 
with SQLTestUtils with Te
         }
       }
     
    +  test("Save csv with custom charset") {
    +    Seq("iso-8859-1", "utf-8", "windows-1250").foreach { encoding =>
    --- End diff --
    
    Could you check the `UTF-16` and `UTF-32` encoding too. The written csv 
files must contain [BOMs](https://en.wikipedia.org/wiki/Byte_order_mark) for 
such encodings. I am not sure that Spark CSV datasource is able to read it in 
per-line mode (`multiLine` is set to `false`). Probably, you need to switch to 
multLine mode or read the files by Scala's library like in JsonSuite: 
https://github.com/apache/spark/blob/c7e2742f9bce2fcb7c717df80761939272beff54/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala#L2322-L2338



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #20949: [SPARK-19018][SQL] Add support for custom encodin...

Reply via email to