[GitHub] spark pull request #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Supp...

HyukjinKwon Mon, 09 Apr 2018 01:01:07 -0700

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20937#discussion_r180014167
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
    @@ -366,6 +366,9 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
        * `java.text.SimpleDateFormat`. This applies to timestamp type.</li>
        * <li>`multiLine` (default `false`): parse one record, which may span 
multiple lines,
        * per file</li>
    +   * <li>`encoding` (by default it is not set): allows to forcibly set one 
of standard basic
    +   * or extended charsets for input jsons. For example UTF-8, UTF-16BE, 
UTF-32. If the encoding
    +   * is not specified (by default), it will be detected automatically.</li>
    --- End diff --
    
    > If encoding is not set, it will be detected by Jackson independently from 
multiline.
    
    Jackson detects but Spark doesn't correctly when `multiLine` is disabled 
even with this PR, as we talked. We found many holes. Why did you bring this 
again?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Supp...

Reply via email to