[GitHub] spark pull request #23091: [SPARK-26122][SQL] Support encoding for multiLine...

MaxGekk Mon, 19 Nov 2018 13:28:42 -0800

GitHub user MaxGekk opened a pull request:

    https://github.com/apache/spark/pull/23091


    [SPARK-26122][SQL] Support encoding for multiLine in CSV datasource

    ## What changes were proposed in this pull request?
    
    In the PR, I propose to pass the CSV option `encoding`/`charset` to 
`uniVocity` parser to allow parsing CSV files in different encodings when 
`multiLine` is enabled. The value of the option is passed to the `beginParsing` 
method of `CSVParser`.
    
    ## How was this patch tested?
    
    Added new test to `CSVSuite` for different encodings and enabled/disabled 
header.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/MaxGekk/spark-1 csv-miltiline-encoding

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/23091.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #23091
    
----
commit 1a7a0cb4430f847ac95c0c764393003581415103
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-11-19T20:51:04Z

    Added a test

commit cd57ec5833bbfb5f0b33d63a56b48d25924f6be1
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-11-19T21:07:41Z

    Test multiple encodings

commit 1c76f8944979df8a7b9b8181ebfa38933c3f2c00
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-11-19T21:09:04Z

    Pass encoding to uniVocity parser

commit 16eb14c73f3fad8d83fee41d5665b52f180daf73
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-11-19T21:22:23Z

    Test with header and without it

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #23091: [SPARK-26122][SQL] Support encoding for multiLine...

Reply via email to