Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/20937
@HyukjinKwon Let's sync.
> Automatic encoding detection doesn't work for newlines and schema
inference when multiLine is disabled
I don't know about you but I used to think if something doesn't work it
means it doesn't work in ALL cases. You write some statements that are
partially correct or incorrect. About this statement, here are counterexamples:
1. File in UTF-8, multiline is disabled - newline and schema will be
inferred correctly? Yes
2. File in ISO 8859-1, multiline is disabled. Does it work? Yes.
3. Encoding is CP1251 - the same
All those examples show that your statement is wrong in mathematical
meaning.
> I thought this PR targets to add the **explicit encoding** support mainly
EXACTLY. I don't know why do you push me to do something with
auto-detection. The PR doesn't change behavior in the case if `encoding` is not
specified. The PR is not about supporting any encoding in any cases. It is
about the cases when the `encoding` is specified by an user explicitly.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]