Github user holdenk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21247#discussion_r194117346
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala
---
@@ -138,3 +121,40 @@ private[sql] class JSONOptions(
factory.configure(JsonParser.Feature.ALLOW_UNQUOTED_CONTROL_CHARS,
allowUnquotedControlChars)
}
}
+
+private[sql] class JSONOptionsInRead(
+ @transient override val parameters: CaseInsensitiveMap[String],
+ defaultTimeZoneId: String,
+ defaultColumnNameOfCorruptRecord: String)
+ extends JSONOptions(parameters, defaultTimeZoneId,
defaultColumnNameOfCorruptRecord) {
+
+ def this(
+ parameters: Map[String, String],
+ defaultTimeZoneId: String,
+ defaultColumnNameOfCorruptRecord: String = "") = {
+ this(
+ CaseInsensitiveMap(parameters),
+ defaultTimeZoneId,
+ defaultColumnNameOfCorruptRecord)
+ }
+
+ protected override def checkedEncoding(enc: String): String = {
+ // The following encodings are not supported in per-line mode
(multiline is false)
+ // because they cause some problems in reading files with BOM which is
supposed to
+ // present in the files with such encodings. After splitting input
files by lines,
+ // only the first lines will have the BOM which leads to impossibility
for reading
+ // the rest lines. Besides of that, the lineSep option must have the
BOM in such
+ // encodings which can never present between lines.
+ val blacklist = Seq(Charset.forName("UTF-16"),
Charset.forName("UTF-32"))
+ val isBlacklisted = blacklist.contains(Charset.forName(enc))
+ require(multiLine || !isBlacklisted,
--- End diff --
So I could be missing something, but it seems like we might allow folks to
write data they can't read back in the same mode as write, would it make sense
to have an equivalent checkedEncoding on the write side that just logs a
warning for folks? I could have also misunderstood.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]