Correct; that's the completely degenerate case where you can't do anything
in parallel.  Often you'll also want your iterator function to send back
some information to an accumulator (perhaps just the result calculated with
the last element of the partition) which is then fed back into the
operation on the next partition as either a broadcast variable or part of
the closure.



On Tue, Oct 22, 2013 at 3:25 PM, Nathan Kronenfeld <
[email protected]> wrote:

> You shouldn't have to fly data around
>
> You can just run it first on partition 0, then on partition 1, etc...  I
> may have the name slightly off, but something approximately like:
>
> for (p <- 0 until numPartitions)
>   data.mapPartitionsWithIndex((i, iter) => if (0 == p) iter.map(fcn) else
> List().iterator)
>
> should work... BUT that being said, you've now really lost the point of
> using Spark to begin with.
>
>

Reply via email to