Re: foreach plus accumulator Vs mapPartitions performance

Burak Yavuz Thu, 21 May 2015 18:32:31 -0700

Or you can simply use `reduceByKeyLocally` if you don't want to worry about
implementing accumulators and such, and assuming that the reduced values
will fit in memory of the driver (which you are assuming by using
accumulators).


Best,
Burak

On Thu, May 21, 2015 at 2:46 PM, ben <delpizz...@gmail.com> wrote:

> Hi, everybody.
>
> There are some cases in which I can obtain the same results by using the
> mapPartitions and the foreach method.
>
> For example in a typical MapReduce approach one would perform a reduceByKey
> immediately after a mapPartitions that transform the original RDD in a
> collection of tuple (key, value). I think that is possible to achieve the
> same result by using, for instance an array of accumulator where at each
> index an executor sums a value and the index itself could be a key.
>
> Since the reduceByKey will perform a shuffle on disk I think that when is
> possible, the foreach approach should be better even though the foreach has
> the side effect of sum a value to an accumulator.
>
> I am making this request to see if my reasoning is correct . I hope I was
> clear.
> Thank you, Beniamino
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/foreach-plus-accumulator-Vs-mapPartitions-performance-tp22982.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: foreach plus accumulator Vs mapPartitions performance

Reply via email to