Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/20849#discussion_r175281238
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
---
@@ -2063,4 +2063,178 @@ class JsonSuite extends QueryTest with
SharedSQLContext with TestJsonData {
)
}
}
+
+ def testFile(fileName: String): String = {
+
Thread.currentThread().getContextClassLoader.getResource(fileName).toString
+ }
+
+ test("json in UTF-16 with BOM") {
+ val fileName = "json-tests/utf16WithBOM.json"
+ val schema = new StructType().add("firstName",
StringType).add("lastName", StringType)
+ val jsonDF = spark.read.schema(schema)
+ // The mode filters null rows produced because new line delimiter
+ // for UTF-8 is used by default.
--- End diff --
The test came from customer's use case, when we broke backward
compatibility with previous versions by forcibly set input stream as UTF-8:
https://github.com/apache/spark/pull/20302 . You can see the test case in the
PR when jackson-json parser is not able to detect charset correctly.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]