Not very sure about the meaning of “mean of RDD by key”, is this what you
want?
val meansByKey = rdd
.map { case (k, v) =>
k -> (v, 1)
}
.reduceByKey { (lhs, rhs) =>
(lhs._1 + rhs._1, lhs._2 + rhs._2)
}
.map { case (sum, count) =>
sum / count
}
.collectAsMap()
With this, you need to be careful about overflow though.
On Tue, Apr 1, 2014 at 10:55 PM, Jaonary Rabarisoa <[email protected]>wrote:
> Hi all;
>
> Can someone give me some tips to compute mean of RDD by key , maybe with
> combineByKey and StatCount.
>
> Cheers,
>
> Jaonary
>