Flashback: RDD.aggregate versus accumulables...

jiml Thu, 10 Mar 2016 16:07:09 -0800

And Lord Joe you were right future versions did protect accumulators in
actions. I wonder if anyone has a "modern" take on the accumulator vs.
aggregate question. Seems like if I need to do it by key or control
partitioning I would use aggregate.


Bottom line question / reason for post: I wonder if anyone has more ideas
about using aggregate instead? Am I right to think accumulables are always
present on the driver, whereas an aggregate needs to be pulled to the driver
manually?

Details: 

But they both give me an option to write custom adds and merges:
For example this class I am stubbing out:

    class DropEvalAccumulableParam implements
AccumulableParam<DropEvaluation, DropResult> {

        // Add additional data to the accumulator value. Is allowed to
modify and return r for efficiency (to avoid allocating objects).
        // r is the first value
        @Override
        public DropEvaluation addAccumulator(DropEvaluation dropEvaluation,
DropResult dropResult) {
            return null;
        }

        // Merge two accumulated values together. Is allowed to modify and
return the first value for efficiency (to avoid allocating objects).
        @Override
        public DropEvaluation addInPlace(DropEvaluation masterDropEval,
DropEvaluation r1) {
            return null;
        }

        // Return the "zero" (identity) value for an accumulator type, given
its initial value. For example, if R was a vector of N dimensions,
        // this would return a vector of N zeroes.
        @Override
        public DropEvaluation zero(DropEvaluation dropEvaluation) {
            // technically the "additive identity" of a DropEvaluation would
be


            return dropEvaluation;
        }
    }





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/RDD-aggregate-versus-accumulables-tp19044p26456.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Flashback: RDD.aggregate versus accumulables...

Reply via email to