[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...

MaxGekk Mon, 09 Apr 2018 02:28:51 -0700

Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/20937
  
    @HyukjinKwon Let's sync.
    
    > Automatic encoding detection doesn't work for newlines and schema 
inference when multiLine is disabled
    
    I don't know about you but I used to think if something doesn't work it 
means it doesn't work in ALL cases. You write some statements that are 
partially correct or incorrect. About this statement, here are counterexamples:
    1. File in UTF-8, multiline is disabled - newline and schema will be 
inferred correctly? Yes
    2. File in ISO 8859-1, multiline is disabled. Does it work? Yes.
    3. Encoding is CP1251 - the same
    
    All those examples show that your statement is wrong in mathematical 
meaning. 
    
    > I thought this PR targets to add the **explicit encoding** support mainly
    
    EXACTLY. I don't know why do you push me to do something with 
auto-detection. The PR doesn't change behavior in the case if `encoding` is not 
specified. The PR is not about supporting any encoding in any cases. It is 
about the cases when the `encoding` is specified by an user explicitly.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...

Reply via email to