GitHub user rxin opened a pull request:
https://github.com/apache/spark/pull/10766
[SPARK-12833][SQL] Initial import of spark-csv
CSV is the most common data format in the "small data" world. It is often
the first format people want to try when they see Spark on a single node.
Having to rely on a 3rd party component for this leads to poor user experience
for new users. This PR merges the popular spark-csv data source package
(https://github.com/databricks/spark-csv) with SparkSQL.
This is a first PR to bring the functionality to spark 2.0 master. We will
complete items outlines in the design document (see JIRA attachment) in follow
up pull requests.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/rxin/spark csv
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/10766.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #10766
----
commit f3e99bde657ece010929e04f622ccdf75588af0d
Author: Hossein <[email protected]>
Date: 2016-01-06T00:40:13Z
Added univocity-parsers as a dependency
commit 29c15c84b511363f47dca327f791f2c5e28ffcea
Author: Hossein <[email protected]>
Date: 2016-01-06T00:42:20Z
Added inline implementation of spark-csv in SparkSQL
commit c9900d800fddb69a74f54dcd3b1dfc0afea8e8ee
Author: Hossein <[email protected]>
Date: 2016-01-06T00:48:07Z
Minor style and comments with some TODOs
commit da314cb9cb323b5800175e15a49fe48f5c5c5e75
Author: Hossein <[email protected]>
Date: 2016-01-06T06:37:30Z
Ported tests from spark-csv
commit e85cd7d708dd8c7f175d936606b9744c4d7c5727
Author: Hossein <[email protected]>
Date: 2016-01-06T10:02:27Z
Excluding test resource files from license check
commit b09c38c65aeb92566df79d31dd14cda5dc0bb262
Author: Hossein <[email protected]>
Date: 2016-01-06T10:02:54Z
Adding test resource file for nullable types
commit b31cb893dfcd87d1269a4a932d34fed830fe55ce
Author: Hossein <[email protected]>
Date: 2016-01-06T10:08:08Z
Remove debugging message and extra lines
commit e364c284f2d37540aa2487220b417fa433198361
Author: Hossein <[email protected]>
Date: 2016-01-06T20:12:24Z
Updated deps
commit 1856ed33dc4b677b0f3c83f61c100640c3f8e801
Author: Hossein <[email protected]>
Date: 2016-01-06T22:38:45Z
Using Hive numericPrecedence
commit 0fd4bd3cd177e23c46db56b2a08a12b85c57355f
Author: Hossein <[email protected]>
Date: 2016-01-07T00:15:24Z
Not using depricated DecimalType constructor
commit 1e312a525c85ec08f2aa76870fe812716f6699a0
Author: Hossein <[email protected]>
Date: 2016-01-07T06:39:05Z
Further style improvement
commit 319e0edb17d02eb994bc1cd104a29df8c47a9c59
Author: Hossein <[email protected]>
Date: 2016-01-07T22:08:16Z
Fixing write test
commit c448be766c8dddbd25fcba5d817bf66b976e0b5a
Author: Reynold Xin <[email protected]>
Date: 2016-01-15T06:55:15Z
Merge pull request #10615 from falaki/SPARK-12420
[SPARK-12420][SQL] Have a built-in CSV data source implementation
commit ff22a2c883e8236f911bd583b7e9a4da66d6e980
Author: Reynold Xin <[email protected]>
Date: 2016-01-15T07:08:19Z
Fix scala style and add notice file.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]