[GitHub] spark pull request #20849: [SPARK-23723] New charset option for json datasou...

HyukjinKwon Sun, 18 Mar 2018 02:48:07 -0700

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20849#discussion_r175282994
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala
 ---
    @@ -85,6 +85,12 @@ private[sql] class JSONOptions(
     
       val multiLine = 
parameters.get("multiLine").map(_.toBoolean).getOrElse(false)
     
    +  /**
    +   * Standard charset name. For example UTF-8, UTF-16 and UTF-32.
    +   * If charset is not specified (None), it will be detected automatically.
    --- End diff --
    
    Json's schema inference use the text datasource to separate the lines 
before we go through jackson parser where the charset for newlines should be 
respected. Shouldn't we better fix text datasource with the hadoop's line 
reader first?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20849: [SPARK-23723] New charset option for json datasou...

Reply via email to