GitHub user MaxGekk opened a pull request:
https://github.com/apache/spark/pull/22528
[SPARK-25513][SQL] Read zipped CSV and JSON
## What changes were proposed in this pull request?
In the PR, I propose to support reading of zip archives containing **one**
CSV or JSON file in the multi-line mode.
## How was this patch tested?
Added tests for CSV and JSON where zip archives are created by Java library.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/MaxGekk/spark-1 read-zipped-csv-json
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22528.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22528
----
commit a926d277e0cecb4d2d66e6500a68e656da6e1d2f
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-09-22T19:49:44Z
Support zip archives
commit 29716248b1ef504ab828c6b8af8ac78f1013923a
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-09-22T19:49:59Z
Add test for zipped CSV files
commit 149e452d17cffecb024c29771dc05322295ba437
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-09-22T19:52:18Z
Fix imports
commit 1dff39eb7e06435551ab7ba0d0443b106e60e4b6
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-09-22T19:57:10Z
Added a test for zipped JSON
commit 09dff81b34600c05a3b30a135c32e9dcd40e5bae
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-09-22T19:58:56Z
Refactoring of the CSV test
commit 5fda51a3505437c4a32f146940a908cd1557bbf5
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-09-22T20:02:37Z
Make extension checking case agnostic
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]