Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20727#discussion_r174998682
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/text/TextOptions.scala
---
@@ -39,9 +39,12 @@ private[text] class TextOptions(@transient private val
parameters: CaseInsensiti
*/
val wholeText = parameters.getOrElse(WHOLETEXT, "false").toBoolean
+ val lineSeparator: String = parameters.getOrElse(LINE_SEPARATOR, "\n")
+ require(lineSeparator.nonEmpty, s"'$LINE_SEPARATOR' cannot be an empty
string.")
}
private[text] object TextOptions {
val COMPRESSION = "compression"
val WHOLETEXT = "wholetext"
+ val LINE_SEPARATOR = "lineSep"
--- End diff --
My reason is to refer other places so that practically other users feel
comfortable, which I usually put more importances. I really don't want to spend
time on research why the other references used the term "line".
If we think about the plain text, CSV or JSON, the term "line" can be
correct in a way. We documented http://jsonlines.org/ (even this reference used
the term "line"). I think, for example, the line can be defined by its
separator.
https://github.com/apache/spark/blob/c36fecc3b416c38002779c3cf40b6a665ac4bf13/sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLParserSuite.scala#L1645
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]