Hi, Read somewhere that
groupByKey() in RDD disables map-side aggregation as the aggregation function (appending to a list) does not save any space. However from my understanding, using something like reduceByKey or (CombineByKey + a combiner function,) we could reduce the data shuffled around. Wondering why map-side aggregation is disabled for groupByKey() and why it wouldn’t save space at the executor where data is received after the shuffle. cheers Appu