[GitHub] spark pull request #21380: [SPARK-24329][SQL] Remove comments filtering befo...

MaxGekk Tue, 22 May 2018 05:22:59 -0700

Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21380#discussion_r189879531
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
 ---
    @@ -300,14 +302,11 @@ private[csv] object UnivocityParser {
           lines
         }
     
    -    val filteredLines: Iterator[String] =
    -      CSVUtils.filterCommentAndEmpty(linesWithoutHeader, options)
    --- End diff --
    
    I wrote a test in the PR: https://github.com/apache/spark/pull/21394 which 
is passed on the current implementation but fails on this PR. After this PR, 
lines with multiple whitespaces are not ignored. To ignore such lines, need to 
set `ignoreLeadingWhiteSpace` to `true`. See 
https://github.com/uniVocity/univocity-parsers/blob/v2.6.3/src/main/java/com/univocity/parsers/csv/CsvParser.java#L106-L110



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #21380: [SPARK-24329][SQL] Remove comments filtering befo...

Reply via email to