You should *never* use accumulators for this purpose because you may get
incorrect answers. Accumulators can count the same thing multiple times -
you cannot rely upon the correctness of the values they compute. See
SPARK-732 <https://issues.apache.org/jira/browse/SPARK-732> for more info.

On Sun, Nov 16, 2014 at 10:06 PM, Segerlind, Nathan L <
[email protected]> wrote:

>  Hi All.
>
>
>
> I am trying to get my head around why using accumulators and accumulables
> seems to be the most recommended method for accumulating running sums,
> averages, variances and the like, whereas the aggregate method seems to me
> to be the right one. I have no performance measurements as of yet, but it
> seems that aggregate is simpler and more intuitive (And it does what one
> might expect an accumulator to do) whereas the accumulators and
> accumulables seem to have some extra complications and overhead.
>
>
>
> So…
>
>
>
> What’s the real difference between an accumulator/accumulable and
> aggregating an RDD? When is one method of aggregation preferred over the
> other?
>
>
>
> Thanks,
>
> Nate
>



-- 
Daniel Siegmann, Software Developer
Velos
Accelerating Machine Learning

54 W 40th St, New York, NY 10018
E: [email protected] W: www.velos.io

Reply via email to