Re: RDD.aggregate versus accumulables...

Surendranauth Hiraman Mon, 17 Nov 2014 08:58:37 -0800

We use Algebird for calculating things like min/max, stddev, variance, etc.


https://github.com/twitter/algebird/wiki

-Suren


On Mon, Nov 17, 2014 at 11:32 AM, Daniel Siegmann <[email protected]>
wrote:

> You should *never* use accumulators for this purpose because you may get
> incorrect answers. Accumulators can count the same thing multiple times -
> you cannot rely upon the correctness of the values they compute. See
> SPARK-732 <https://issues.apache.org/jira/browse/SPARK-732> for more info.
>
> On Sun, Nov 16, 2014 at 10:06 PM, Segerlind, Nathan L <
> [email protected]> wrote:
>
>>  Hi All.
>>
>>
>>
>> I am trying to get my head around why using accumulators and accumulables
>> seems to be the most recommended method for accumulating running sums,
>> averages, variances and the like, whereas the aggregate method seems to me
>> to be the right one. I have no performance measurements as of yet, but it
>> seems that aggregate is simpler and more intuitive (And it does what one
>> might expect an accumulator to do) whereas the accumulators and
>> accumulables seem to have some extra complications and overhead.
>>
>>
>>
>> So…
>>
>>
>>
>> What’s the real difference between an accumulator/accumulable and
>> aggregating an RDD? When is one method of aggregation preferred over the
>> other?
>>
>>
>>
>> Thanks,
>>
>> Nate
>>
>
>
>
> --
> Daniel Siegmann, Software Developer
> Velos
> Accelerating Machine Learning
>
> 54 W 40th St, New York, NY 10018
> E: [email protected] W: www.velos.io
>



-- 

SUREN HIRAMAN, VP TECHNOLOGY
Velos
Accelerating Machine Learning

440 NINTH AVENUE, 11TH FLOOR
NEW YORK, NY 10001
O: (917) 525-2466 ext. 105
F: 646.349.4063
E: suren.hiraman@v <[email protected]>elos.io
W: www.velos.io

Re: RDD.aggregate versus accumulables...

Reply via email to