Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20937#discussion_r179952240
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala
---
@@ -86,14 +85,34 @@ private[sql] class JSONOptions(
val multiLine =
parameters.get("multiLine").map(_.toBoolean).getOrElse(false)
- val lineSeparator: Option[String] = parameters.get("lineSep").map { sep
=>
- require(sep.nonEmpty, "'lineSep' cannot be an empty string.")
- sep
+ /**
+ * A string between two consecutive JSON records.
+ */
+ val lineSeparator: Option[String] = parameters.get("lineSep")
+
+ /**
+ * Standard encoding (charset) name. For example UTF-8, UTF-16LE and
UTF-32BE.
+ * If the encoding is not specified (None), it will be detected
automatically.
+ */
+ val encoding: Option[String] = parameters.get("encoding")
+ .orElse(parameters.get("charset")).map { enc =>
+ val blacklist = List("UTF16", "UTF32")
+ val isBlacklisted =
blacklist.contains(enc.toUpperCase.replaceAll("-|_", ""))
+ require(multiLine || !isBlacklisted,
+ s"""The ${enc} encoding must not be included in the blacklist:
+ | ${blacklist.mkString(", ")}""".stripMargin)
+
+ val forcingLineSep = !(multiLine == false && enc != "UTF-8" &&
lineSeparator.isEmpty)
+ require(forcingLineSep,
+ s"""The lineSep option must be specified for the $enc encoding.
+ |Example: .option("lineSep", "|^|")
--- End diff --
I think we are fine to remove this example .. Can we just use prose, for
example, `'lineSep' option must be explicitly set when 'encoding' option is
specified.` (feel free to not use it as is. just was thinking)? It doesn't
describe SQL syntax too ...
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]