[GitHub] spark pull request #20727: [SPARK-23577][SQL] Supports custom line separator...

MaxGekk Mon, 05 Mar 2018 14:06:12 -0800

Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20727#discussion_r172341859
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/text/TextOptions.scala
 ---
    @@ -39,9 +39,12 @@ private[text] class TextOptions(@transient private val 
parameters: CaseInsensiti
        */
       val wholeText = parameters.getOrElse(WHOLETEXT, "false").toBoolean
     
    +  val lineSeparator: String = parameters.getOrElse(LINE_SEPARATOR, "\n")
    +  require(lineSeparator.nonEmpty, s"'$LINE_SEPARATOR' cannot be an empty 
string.")
     }
     
     private[text] object TextOptions {
       val COMPRESSION = "compression"
       val WHOLETEXT = "wholetext"
    +  val LINE_SEPARATOR = "lineSep"
    --- End diff --
    
    Why is it not "lineSeparator"? I would propose another name for the option: 
recordSeparator.  Could you image you have the text file:
    
    ```
    id: 123
    cmd: ls -l
    ---
    id: 456
    cmd: rm -rf
    ```
    where the separator is `---`. If the separator is not new line delimiter, 
records don't looks like lines. And recordSeparator would be closer to Hadoop's 
terminology. Besides of that, probably, we introduce similar option for another 
datasources like json. recordSeparator of json records (not lines) sounds 
better from my point of view.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #20727: [SPARK-23577][SQL] Supports custom line separator...

Reply via email to