Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20849#discussion_r175282994
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala
---
@@ -85,6 +85,12 @@ private[sql] class JSONOptions(
val multiLine =
parameters.get("multiLine").map(_.toBoolean).getOrElse(false)
+ /**
+ * Standard charset name. For example UTF-8, UTF-16 and UTF-32.
+ * If charset is not specified (None), it will be detected automatically.
--- End diff --
Json's schema inference use the text datasource to separate the lines
before we go through jackson parser where the charset for newlines should be
respected. Shouldn't we better fix text datasource with the hadoop's line
reader first?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]