Surprisingly, this turned out to be more complicated than what I expected.

I had the impression that this would be trivial in spark. Am I missing
something here?


On Tue, Jan 21, 2014 at 5:42 AM, Aureliano Buendia <[email protected]>wrote:

> Hi,
>
> It seems spark does not support nested RDD's, so I was wondering how can
> spark handle multi dimensional reductions.
>
> As an example consider a dataset with these rows:
>
> ((i, j), value)
>
> where i, j and k are long indexes, and value is a double.
>
> How is it possible to first reduce the above rdd over j, and then reduce
> the results over i?
>
> Just to clarify, a scala equivalent would look like this:
>
> var results = 0
> for (i <- 0 until I) {
>   var jReduction = 0
>   for (j <- 0 until J) {
>     *// Reduce over j*
>     jReduction = jReduction + rdd(i,j)
>   }
>   *// Reduce over i*
>   results = results * jReductions(i)
> }
>
>

Reply via email to