GitHub user MaxGekk opened a pull request:

    https://github.com/apache/spark/pull/21394

    [SPARK-24329][SQL] Test for skipping multi-space lines

    ## What changes were proposed in this pull request?
    
    The PR is a continue of https://github.com/apache/spark/pull/21380 . It 
checks cases that are handled by the code:
    
https://github.com/apache/spark/blob/e3de6ab30d52890eb08578e55eb4a5d2b4e7aa35/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala#L303-L304
    
    Basically the code skips lines with one or many whitespaces, and lines with 
comments (see 
[filterCommentAndEmpty](https://github.com/apache/spark/blob/e3de6ab30d52890eb08578e55eb4a5d2b4e7aa35/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVUtils.scala#L47))
    
    ```scala
       iter.filter { line =>
          line.trim.nonEmpty && !line.startsWith(options.comment.toString)
        }
    ```
    ## How was this patch tested?
    
    Added a test for the case described above.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/MaxGekk/spark-1 test-for-multi-space-lines

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21394.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21394
    
----
commit b0f73e5f5dda5ec74c91dad07f50f9960402cc82
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-05-22T11:59:51Z

    Test checks skipping lines with comments, and one or multiple whitespaces

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to