[GitHub] spark pull request #20849: [SPARK-23723] New charset option for json datasou...

MaxGekk Sun, 18 Mar 2018 01:35:56 -0700

Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20849#discussion_r175281238
  
    --- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
 ---
    @@ -2063,4 +2063,178 @@ class JsonSuite extends QueryTest with 
SharedSQLContext with TestJsonData {
           )
         }
       }
    +
    +  def testFile(fileName: String): String = {
    +    
Thread.currentThread().getContextClassLoader.getResource(fileName).toString
    +  }
    +
    +  test("json in UTF-16 with BOM") {
    +    val fileName = "json-tests/utf16WithBOM.json"
    +    val schema = new StructType().add("firstName", 
StringType).add("lastName", StringType)
    +    val jsonDF = spark.read.schema(schema)
    +      // The mode filters null rows produced because new line delimiter
    +      // for UTF-8 is used by default.
    --- End diff --
    
    The test came from customer's use case, when we broke backward 
compatibility with previous versions by forcibly set input stream as UTF-8:  
https://github.com/apache/spark/pull/20302 . You can see the test case in the 
PR  when jackson-json parser is not able to detect charset correctly.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #20849: [SPARK-23723] New charset option for json datasou...

Reply via email to