[GitHub] [spark] stczwd commented on a change in pull request #29088: [SPARK-32289][SQL] Some characters are garbled when opening csv files with Excel

2020-07-14 Thread GitBox


stczwd commented on a change in pull request #29088:
URL: https://github.com/apache/spark/pull/29088#discussion_r454791986



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CsvOutputWriter.scala
##
@@ -39,6 +39,10 @@ class CsvOutputWriter(
 
   private val gen = new UnivocityGenerator(dataSchema, writer, params)
 
+  if (params.bom) {
+writer.write(0xFEFF)

Review comment:
   Excel. It will change the actual value if we add `0xFEFF` in the front.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] stczwd commented on a change in pull request #29088: [SPARK-32289][SQL] Some characters are garbled when opening csv files with Excel

2020-07-14 Thread GitBox


stczwd commented on a change in pull request #29088:
URL: https://github.com/apache/spark/pull/29088#discussion_r454746965



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CsvOutputWriter.scala
##
@@ -39,6 +39,10 @@ class CsvOutputWriter(
 
   private val gen = new UnivocityGenerator(dataSchema, writer, params)
 
+  if (params.bom) {
+writer.write(0xFEFF)

Review comment:
   We meet the same problem in our project, and we use `0xEFBBBF` as 
default BOM for UTF-8, it will change the value if we use `0xFEFF`. Besides, we 
have meet other problems, such as the commas were used incorrectly or quotation 
marks were not displayed properly. If we fix these problems, it will cause 
other users can not read these files with other tools.
   
   Hm, what I trying to say is, maybe it is not a good idea to change 
CsvOutputWriter to fit Excel format. It can be done in project before use 
downloads the csv files or just use Excel to import.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org