GitHub user goldmedal opened a pull request:
https://github.com/apache/spark/pull/19339
[SPARK-22112][PYSPARK] Add an API to create a DataFrame from RDD[String]
storing CSV
## What changes were proposed in this pull request?
We added a method to the scala API for creating a `DataFrame` from
`DataSet[String]` storing CSV in
[SPARK-15463](https://issues.apache.org/jira/browse/SPARK-15463) but PySpark
doesn't have `Dataset` to support this feature. Therfore, I add an API to
create a `DataFrame` from `RDD[String]` storing csv and it's also consistent
with PySpark's `spark.read.json`.
For example as below
```
>>> rdd = sc.textFile('python/test_support/sql/ages.csv')
>>> df2 = spark.read.csv(rdd)
>>> df2.dtypes
[('_c0', 'string'), ('_c1', 'string')]
```
## How was this patch tested?
add unit test cases.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/goldmedal/spark SPARK-22112
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19339.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19339
----
commit d557892080c8d6ec33dd7a13f4b8cdad88b440b0
Author: goldmedal <[email protected]>
Date: 2017-09-25T09:31:36Z
add csv from `RDD[String]` API and related test case
commit baaa93f5e837cdba02922e183a3f81c287e19854
Author: goldmedal <[email protected]>
Date: 2017-09-25T09:50:34Z
fix test case
commit d4ef30abdda142a969400c9e6e11a089a5483385
Author: goldmedal <[email protected]>
Date: 2017-09-25T11:59:08Z
finish pyspark dataframe from rdd of csv string
commit 9bd4eed474fdfa20d5933558d519fb187694aa33
Author: goldmedal <[email protected]>
Date: 2017-09-25T12:13:50Z
modified comments
commit 7525b48d2b9b59b1d6ce74a145fc049cfce6529a
Author: goldmedal <[email protected]>
Date: 2017-09-25T12:14:55Z
modified comments
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]