GitHub user HyukjinKwon opened a pull request:
https://github.com/apache/spark/pull/20902
[SPARK-23770][R] Exposes repartitionByRange in SparkR
## What changes were proposed in this pull request?
This PR proposes to expose `repartitionByRange`.
```R
> df <- createDataFrame(iris)
...
> getNumPartitions(repartitionByRange(df, 3, col = df$Species))
[1] 3
```
## How was this patch tested?
Manually tested and the unit tests were added. The diff with `repartition`
can be checked as below:
```R
> df <- createDataFrame(mtcars)
> take(repartition(df, 10, df$wt), 3)
mpg cyl disp hp drat wt qsec vs am gear carb
1 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
2 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
3 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
> take(repartitionByRange(df, 10, df$wt), 3)
mpg cyl disp hp drat wt qsec vs am gear carb
1 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
2 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
3 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
```
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/HyukjinKwon/spark r-repartitionByRange
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/20902.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #20902
----
commit 264b50e9f480647d0a807e8c591a7a36944322ce
Author: hyukjinkwon <gurwls223@...>
Date: 2018-03-26T04:37:53Z
Expose repartitionByRange in SparkR
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]