Not very sure about the meaning of “mean of RDD by key”, is this what you want?
val meansByKey = rdd .map { case (k, v) => k -> (v, 1) } .reduceByKey { (lhs, rhs) => (lhs._1 + rhs._1, lhs._2 + rhs._2) } .map { case (sum, count) => sum / count } .collectAsMap() With this, you need to be careful about overflow though. On Tue, Apr 1, 2014 at 10:55 PM, Jaonary Rabarisoa <jaon...@gmail.com>wrote: > Hi all; > > Can someone give me some tips to compute mean of RDD by key , maybe with > combineByKey and StatCount. > > Cheers, > > Jaonary >