[GitHub] spark pull request: [SPARK-12420][SQL] Have a built-in CSV data so...

falaki Tue, 05 Jan 2016 23:05:58 -0800

GitHub user falaki opened a pull request:

    https://github.com/apache/spark/pull/10615


    [SPARK-12420][SQL] Have a built-in CSV data source implementation

    CSV is the most common data format in the "small data" world. It is often 
the first format people want to try when they see Spark on a single node. 
Having to rely on a 3rd party component for this leads to poor user experience 
for new users. This PR merges the popular spark-csv data source package 
(https://github.com/databricks/spark-csv) with SparkSQL.
    
    This is a first PR to bring the functionality to spark 2.0 master. We will 
complete items outlines in the design document (see JIRA attachment) in follow 
up pull requests.
    
    Spark-csv was developed and maintained by several members of the open 
source community:
    @dtpeacock: Type inference
    @mohitjaggi: Integration with uniVocity-parsers 
    @JoshRosen: Build and style checking 
    @aley: Support for comments
    @pashields: Support for compression codecs
    @HyukjinKwon: Several bug fixes
    @rbolkey: Tests and improvements
    @huangjs: Updating API
    @vlyubin: Support for insert
    @brkyvz: Test refactoring
    @rxin: Documentation
    @andy327: Null values
    @yhuai: Documentation
    @akirakw: Documentation
    @dennishuo: Documentation
    @petro-rudenko: Increasing max characters per-column
    @saurfang: Documentation
    @kmader: Tests
    @cvengros: Documentation
    @MarkRijckenberg: Documentation
    @msperlich: Improving compression codec handling
    @thoralf-gutierrez: Documentation
    @lebigot: Documentation
    @sryza: Python documentation
    @xguo27: Documentation
    @darabos: License text in build file
    @jamesblau: Nullable quote character
    @karma243: Java documentation
    @gasparms: Improving double and float type cast
    @MarcinKosinski: R documentation

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/falaki/spark SPARK-12420

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10615.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10615
    
----
commit f3e99bde657ece010929e04f622ccdf75588af0d
Author: Hossein <[email protected]>
Date:   2016-01-06T00:40:13Z

    Added univocity-parsers as a dependency

commit 29c15c84b511363f47dca327f791f2c5e28ffcea
Author: Hossein <[email protected]>
Date:   2016-01-06T00:42:20Z

    Added inline implementation of spark-csv in SparkSQL

commit c9900d800fddb69a74f54dcd3b1dfc0afea8e8ee
Author: Hossein <[email protected]>
Date:   2016-01-06T00:48:07Z

    Minor style and comments with some TODOs

commit da314cb9cb323b5800175e15a49fe48f5c5c5e75
Author: Hossein <[email protected]>
Date:   2016-01-06T06:37:30Z

    Ported tests from spark-csv

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-12420][SQL] Have a built-in CSV data so...

Reply via email to