GitHub user MaxGekk opened a pull request:

    https://github.com/apache/spark/pull/21380

    [SPARK-24329][SQL] Remove comments filtering before parsing of CSV files

    ## What changes were proposed in this pull request?
    
    Filtering of comments and whitespace has been performed by uniVocity parser 
already according to parser settings:
    
https://github.com/apache/spark/blob/branch-2.3/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala#L178-L180
    
    There is no need to repeat the same before uniVocity parser. In this PR, I 
propose to remove filtering of whitespaces and comments (call of 
filterCommentAndEmpty) in the parseIterator method of the UnivocityParser 
object.
    
    ## How was this patch tested?
    
    The changes were tested by CSVSuite


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/MaxGekk/spark-1 delete-comment-filtering

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21380.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21380
    
----
commit 014cfe9542f8ccdc634097fbebda21b9eb99ab7b
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-05-20T21:16:02Z

    Removing of unnecessary comments filtering.

commit 36522689f9579ec05e7d69d1d7bd1f507f6bdbc0
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-05-20T21:18:20Z

    Cleanup remove of comments

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to