groupByKey vs reduceByKey

Appu K Fri, 09 Dec 2016 04:19:38 -0800

Hi,

Read somewhere that


groupByKey() in RDD disables map-side aggregation as the aggregation
function (appending to a list) does not save any space.


However from my understanding, using something like reduceByKey or
 (CombineByKey + a combiner function,) we could reduce the data shuffled
around.

Wondering why map-side aggregation is disabled for groupByKey() and why it
wouldn’t save space at the executor where data is received after the
shuffle.


cheers
Appu

groupByKey vs reduceByKey

Reply via email to