Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/11016#issuecomment-199118697
  
    I found a similar issue with this, 
[SPARK-1849](https://issues.apache.org/jira/browse/SPARK-1849). 
    
    I think we might have to do not support non-ascii compatible encodings 
because it looks this PR will support general encodings but I cannot guarantee 
it supports all the encodings. I mean, this will support general encodings but 
there might be some encodings writing a BOM-bits-like header.
    
    Since Spark CSV is already supporting the encoding option, I cannot come up 
with more than three options below:
    
    - Only CSV data source supports some encodings for backward compatibility 
but except non-ascii compatible encodings and throws an exception when it is 
non-ascii compatible encodings.
    
    - CSV data source supports other encodings in this way but there are 
documentations to mention it does not guarantee all the encodings.
    
    - Supports all the encodings and add the tests for all the encodings (maybe 
with this [encoding 
list](https://docs.oracle.com/javase/8/docs/technotes/guides/intl/encoding.doc.html)
 in Java)
    
    @srowen Would you maybe give some feedback please?
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to