[GitHub] spark pull request #20849: [SPARK-23723] New charset option for json datasou...

MaxGekk Sun, 18 Mar 2018 01:40:48 -0700

Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20849#discussion_r175281373
  
    --- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
 ---
    @@ -2063,4 +2063,178 @@ class JsonSuite extends QueryTest with 
SharedSQLContext with TestJsonData {
           )
         }
       }
    +
    +  def testFile(fileName: String): String = {
    +    
Thread.currentThread().getContextClassLoader.getResource(fileName).toString
    +  }
    +
    +  test("json in UTF-16 with BOM") {
    +    val fileName = "json-tests/utf16WithBOM.json"
    +    val schema = new StructType().add("firstName", 
StringType).add("lastName", StringType)
    +    val jsonDF = spark.read.schema(schema)
    --- End diff --
    
    No because of many empty strings produced by Hadoop LineRecordReader. It 
will be fixed in separate PRs for the issues: SPARK-23725 and/or SPARK-23724 . 
For now you have to specify schema or use multiline mode as a temporary 
workaround.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #20849: [SPARK-23723] New charset option for json datasou...

Reply via email to