[GitHub] spark pull request #20727: [SPARK-23577][SQL] Supports custom line separator...

HyukjinKwon Tue, 06 Mar 2018 14:29:27 -0800

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20727#discussion_r172682591
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFileLinesReader.scala
 ---
    @@ -42,7 +52,12 @@ class HadoopFileLinesReader(
           Array.empty)
         val attemptId = new TaskAttemptID(new TaskID(new JobID(), 
TaskType.MAP, 0), 0)
         val hadoopAttemptContext = new TaskAttemptContextImpl(conf, attemptId)
    -    val reader = new LineRecordReader()
    +    val reader = if (lineSeparator != "\n") {
    +      new LineRecordReader(lineSeparator.getBytes("UTF-8"))
    --- End diff --
    
    I mean, it's initially an unicode string via datasource interface and we 
need to somehow convert it to bytes once as it takes bytes. Do you mean adding 
another option for specifying charset or did I maybe miss something?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #20727: [SPARK-23577][SQL] Supports custom line separator...

Reply via email to