[GitHub] spark pull request #20727: [SPARK-23577][SQL] Supports custom line separator...

HyukjinKwon Thu, 15 Mar 2018 22:05:56 -0700

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20727#discussion_r174998682
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/text/TextOptions.scala
 ---
    @@ -39,9 +39,12 @@ private[text] class TextOptions(@transient private val 
parameters: CaseInsensiti
        */
       val wholeText = parameters.getOrElse(WHOLETEXT, "false").toBoolean
     
    +  val lineSeparator: String = parameters.getOrElse(LINE_SEPARATOR, "\n")
    +  require(lineSeparator.nonEmpty, s"'$LINE_SEPARATOR' cannot be an empty 
string.")
     }
     
     private[text] object TextOptions {
       val COMPRESSION = "compression"
       val WHOLETEXT = "wholetext"
    +  val LINE_SEPARATOR = "lineSep"
    --- End diff --
    
    My reason is to refer other places so that practically other users feel 
comfortable, which I usually put more importances. I really don't want to spend 
time on research why the other references used the term "line".
    
    If we think about the plain text, CSV or JSON, the term "line" can be 
correct in a way. We documented http://jsonlines.org/ (even this reference used 
the term "line"). I think, for example, the line can be defined by its 
separator.
    
    
https://github.com/apache/spark/blob/c36fecc3b416c38002779c3cf40b6a665ac4bf13/sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLParserSuite.scala#L1645



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #20727: [SPARK-23577][SQL] Supports custom line separator...

Reply via email to