[GitHub] spark pull request #21380: [SPARK-24329][SQL] Remove comments filtering befo...

MaxGekk Tue, 22 May 2018 01:56:12 -0700

Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21380#discussion_r189823948
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
 ---
    @@ -300,14 +302,11 @@ private[csv] object UnivocityParser {
           lines
         }
     
    -    val filteredLines: Iterator[String] =
    -      CSVUtils.filterCommentAndEmpty(linesWithoutHeader, options)
    --- End diff --
    
    Guys, this is speculation about the bug, it seems to me. If the code was 
added to fix a bug in previous version fo uniVocity parser, where is the tests 
for that? For now, we filter CSV rows additionally to uniVocity only because we 
don't trust to the library as far as I understand @HyukjinKwon . Following to 
the logic, we should duplicate another functionality of uniVocity in CSV 
datasource. That looks pretty crazy.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #21380: [SPARK-24329][SQL] Remove comments filtering befo...

Reply via email to