GitHub user HyukjinKwon opened a pull request:
https://github.com/apache/spark/pull/22954
[DO-NOT-MERGE][POC] Enables Arrow optimization from R DataFrame to Spark
DataFrame
## What changes were proposed in this pull request?
This PR is not for merging it but targets to demonstrates the feasibility
(with reusing PyArrow code path at its best) and performance improvement when
converting R dataframes to Spark's dataframe. This can be tested as below:
```bash
$ ./bin/sparkR --conf spark.sql.execution.arrow.enabled=true
```
```r
collect(createDataFrame(mtcars))
```
**Requirements:**
- R 3.5.x
- Arrow package 0.12+ (not released yet)
- CRAN released (ARROW-3204)
- withr package
**TODOs:**
- [ ] Performance measurement
- [ ] TDB
## How was this patch tested?
Small test was added.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/HyukjinKwon/spark r-arrow-createdataframe
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22954.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22954
----
commit 90011a5ff48f2c5fa5fae0e2573fcdaa85d44976
Author: hyukjinkwon <gurwls223@...>
Date: 2018-11-06T02:38:37Z
[POC] Enables Arrow optimization from R DataFrame to Spark DataFrame
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]