GitHub user falaki opened a pull request:
https://github.com/apache/spark/pull/10615
[SPARK-12420][SQL] Have a built-in CSV data source implementation
CSV is the most common data format in the "small data" world. It is often
the first format people want to try when they see Spark on a single node.
Having to rely on a 3rd party component for this leads to poor user experience
for new users. This PR merges the popular spark-csv data source package
(https://github.com/databricks/spark-csv) with SparkSQL.
This is a first PR to bring the functionality to spark 2.0 master. We will
complete items outlines in the design document (see JIRA attachment) in follow
up pull requests.
Spark-csv was developed and maintained by several members of the open
source community:
@dtpeacock: Type inference
@mohitjaggi: Integration with uniVocity-parsers
@JoshRosen: Build and style checking
@aley: Support for comments
@pashields: Support for compression codecs
@HyukjinKwon: Several bug fixes
@rbolkey: Tests and improvements
@huangjs: Updating API
@vlyubin: Support for insert
@brkyvz: Test refactoring
@rxin: Documentation
@andy327: Null values
@yhuai: Documentation
@akirakw: Documentation
@dennishuo: Documentation
@petro-rudenko: Increasing max characters per-column
@saurfang: Documentation
@kmader: Tests
@cvengros: Documentation
@MarkRijckenberg: Documentation
@msperlich: Improving compression codec handling
@thoralf-gutierrez: Documentation
@lebigot: Documentation
@sryza: Python documentation
@xguo27: Documentation
@darabos: License text in build file
@jamesblau: Nullable quote character
@karma243: Java documentation
@gasparms: Improving double and float type cast
@MarcinKosinski: R documentation
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/falaki/spark SPARK-12420
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/10615.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #10615
----
commit f3e99bde657ece010929e04f622ccdf75588af0d
Author: Hossein <[email protected]>
Date: 2016-01-06T00:40:13Z
Added univocity-parsers as a dependency
commit 29c15c84b511363f47dca327f791f2c5e28ffcea
Author: Hossein <[email protected]>
Date: 2016-01-06T00:42:20Z
Added inline implementation of spark-csv in SparkSQL
commit c9900d800fddb69a74f54dcd3b1dfc0afea8e8ee
Author: Hossein <[email protected]>
Date: 2016-01-06T00:48:07Z
Minor style and comments with some TODOs
commit da314cb9cb323b5800175e15a49fe48f5c5c5e75
Author: Hossein <[email protected]>
Date: 2016-01-06T06:37:30Z
Ported tests from spark-csv
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]