Surprisingly, this turned out to be more complicated than what I expected. I had the impression that this would be trivial in spark. Am I missing something here?
On Tue, Jan 21, 2014 at 5:42 AM, Aureliano Buendia <[email protected]>wrote: > Hi, > > It seems spark does not support nested RDD's, so I was wondering how can > spark handle multi dimensional reductions. > > As an example consider a dataset with these rows: > > ((i, j), value) > > where i, j and k are long indexes, and value is a double. > > How is it possible to first reduce the above rdd over j, and then reduce > the results over i? > > Just to clarify, a scala equivalent would look like this: > > var results = 0 > for (i <- 0 until I) { > var jReduction = 0 > for (j <- 0 until J) { > *// Reduce over j* > jReduction = jReduction + rdd(i,j) > } > *// Reduce over i* > results = results * jReductions(i) > } > >
