We use Algebird for calculating things like min/max, stddev, variance, etc.
https://github.com/twitter/algebird/wiki -Suren On Mon, Nov 17, 2014 at 11:32 AM, Daniel Siegmann <[email protected]> wrote: > You should *never* use accumulators for this purpose because you may get > incorrect answers. Accumulators can count the same thing multiple times - > you cannot rely upon the correctness of the values they compute. See > SPARK-732 <https://issues.apache.org/jira/browse/SPARK-732> for more info. > > On Sun, Nov 16, 2014 at 10:06 PM, Segerlind, Nathan L < > [email protected]> wrote: > >> Hi All. >> >> >> >> I am trying to get my head around why using accumulators and accumulables >> seems to be the most recommended method for accumulating running sums, >> averages, variances and the like, whereas the aggregate method seems to me >> to be the right one. I have no performance measurements as of yet, but it >> seems that aggregate is simpler and more intuitive (And it does what one >> might expect an accumulator to do) whereas the accumulators and >> accumulables seem to have some extra complications and overhead. >> >> >> >> So… >> >> >> >> What’s the real difference between an accumulator/accumulable and >> aggregating an RDD? When is one method of aggregation preferred over the >> other? >> >> >> >> Thanks, >> >> Nate >> > > > > -- > Daniel Siegmann, Software Developer > Velos > Accelerating Machine Learning > > 54 W 40th St, New York, NY 10018 > E: [email protected] W: www.velos.io > -- SUREN HIRAMAN, VP TECHNOLOGY Velos Accelerating Machine Learning 440 NINTH AVENUE, 11TH FLOOR NEW YORK, NY 10001 O: (917) 525-2466 ext. 105 F: 646.349.4063 E: suren.hiraman@v <[email protected]>elos.io W: www.velos.io
