Not very sure about the meaning of “mean of RDD by key”, is this what you
want?

val meansByKey = rdd
  .map { case (k, v) =>
    k -> (v, 1)
  }
  .reduceByKey { (lhs, rhs) =>
    (lhs._1 + rhs._1, lhs._2 + rhs._2)
  }
  .map { case (sum, count) =>
    sum / count
  }
  .collectAsMap()

With this, you need to be careful about overflow though.


On Tue, Apr 1, 2014 at 10:55 PM, Jaonary Rabarisoa <jaon...@gmail.com>wrote:

> Hi all;
>
> Can someone give me some tips to compute mean of RDD by key , maybe with
> combineByKey and StatCount.
>
> Cheers,
>
> Jaonary
>

Reply via email to