GitHub user jiangxb1987 opened a pull request:
https://github.com/apache/spark/pull/22026
[SPARK-25045][CORE] Make `RDDBarrier.mapParititions` similar to
`RDD.mapPartitions`
## What changes were proposed in this pull request?
Signature of the function passed to `RDDBarrier.mapPartitions()` is
different from that of `RDD.mapPartitions`. The latter doesnât take a
TaskContext. We shall make the function signature the same to avoid confusion
and misusage.
This PR proposes the following API changes:
- In `RDDBarrier`, migrate `mapPartitions` from
```
def mapPartitions[S: ClassTag](
f: (Iterator[T], BarrierTaskContext) => Iterator[S],
preservesPartitioning: Boolean = false): RDD[S]
}
```
to
```
def mapPartitions[S: ClassTag](
f: Iterator[T] => Iterator[S],
preservesPartitioning: Boolean = false): RDD[S]
}
```
- Add new static method to get a `BarrierTaskContext`:
```
object BarrierTaskContext {
def get(): BarrierTaskContext
}
```
## How was this patch tested?
Existing test cases.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jiangxb1987/spark mapPartitions
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22026.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22026
----
commit 46be7c420960a3375d2187ccee1a9fc3d5ef83e6
Author: Xingbo Jiang <xingbo.jiang@...>
Date: 2018-08-07T14:50:17Z
update RDDBarrier.mapPartitions()
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]