GitHub user MaxGekk opened a pull request:

    https://github.com/apache/spark/pull/23080

    [SPARK-26108][SQL] Support custom lineSep in CSV datasource

    ## What changes were proposed in this pull request?
    
    In the PR,  I propose new options for CSV datasource - `lineSep` similar to 
Text and JSON datasource. The option allows to specify custom line separator of 
maximum length of 2 characters (because of a restriction in `uniVocity` 
parser). New option can be used in reading and writing CSV files.  
    
    ## How was this patch tested?
    
    Added a few tests with custom `lineSep` for enabled/disabled `multiLine` in 
read as well as tests in write. Also I added roundtrip tests.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/MaxGekk/spark-1 csv-line-sep

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/23080.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #23080
    
----
commit a790bb30e575cf6d4ffaeda307f0405f1bfecf03
Author: Maxim Gekk <max.gekk@...>
Date:   2018-11-17T21:44:47Z

    Added a test for default line separator

commit 7a47990af7a9e8782fbde2955c0cf6e4848a3806
Author: Maxim Gekk <max.gekk@...>
Date:   2018-11-17T21:56:34Z

    Test for custom lineSep

commit be2870f1006c3f2e783cec0c40bd6e1c7e4c5652
Author: Maxim Gekk <max.gekk@...>
Date:   2018-11-18T09:59:07Z

    Test on read

commit a058a6f2d6771173837ba4b6e829b2067993adb7
Author: Maxim Gekk <max.gekk@...>
Date:   2018-11-18T10:33:12Z

    Support lineSep in write

commit 7e3c0264ae93e270ed8b63c53897a2b775fa65ff
Author: Maxim Gekk <max.gekk@...>
Date:   2018-11-18T10:36:17Z

    Check roundtrip

commit 486b090139ce6d7a93a24edae000fb546b4931db
Author: Maxim Gekk <max.gekk@...>
Date:   2018-11-18T10:42:08Z

    Test another char

commit a0fedbbb06f33716fc632d3b4dd2a687b2587966
Author: Maxim Gekk <max.gekk@...>
Date:   2018-11-18T11:03:20Z

    Don't keep quotes

commit 5f013f505e7a57e4f72f6f1185f1dcdedc0960b5
Author: Maxim Gekk <max.gekk@...>
Date:   2018-11-18T11:13:38Z

    Support 2 chars as lineSep

commit 65786dfabbb5c901e3f8d32f737a6b24a2f58b6b
Author: Maxim Gekk <max.gekk@...>
Date:   2018-11-18T11:14:22Z

    Revert unrelated changes

commit 49b91ea06b757a2feed283de1634c36a59ace8f0
Author: Maxim Gekk <max.gekk@...>
Date:   2018-11-18T11:26:19Z

    Test restrictions for lineSep

commit 12022ad1a0194a4bab9007d66145071562e066a4
Author: Maxim Gekk <max.gekk@...>
Date:   2018-11-18T11:39:12Z

    Updating comments and docs

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to