Hi,
It seems spark does not support nested RDD's, so I was wondering how can
spark handle multi dimensional reductions.
As an example consider a dataset with these rows:
((i, j), value)
where i, j and k are long indexes, and value is a double.
How is it possible to first reduce the above rdd over j, and then reduce
the results over i?
Just to clarify, a scala equivalent would look like this:
var results = 0
for (i <- 0 until I) {
var jReduction = 0
for (j <- 0 until J) {
*// Reduce over j*
jReduction = jReduction + rdd(i,j)
}
*// Reduce over i*
results = results * jReductions(i)
}