Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/20849#discussion_r175281373
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
---
@@ -2063,4 +2063,178 @@ class JsonSuite extends QueryTest with
SharedSQLContext with TestJsonData {
)
}
}
+
+ def testFile(fileName: String): String = {
+
Thread.currentThread().getContextClassLoader.getResource(fileName).toString
+ }
+
+ test("json in UTF-16 with BOM") {
+ val fileName = "json-tests/utf16WithBOM.json"
+ val schema = new StructType().add("firstName",
StringType).add("lastName", StringType)
+ val jsonDF = spark.read.schema(schema)
--- End diff --
No because of many empty strings produced by Hadoop LineRecordReader. It
will be fixed in separate PRs for the issues: SPARK-23725 and/or SPARK-23724 .
For now you have to specify schema or use multiline mode as a temporary
workaround.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]