Hi,

It seems spark does not support nested RDD's, so I was wondering how can
spark handle multi dimensional reductions.

As an example consider a dataset with these rows:

((i, j), value)

where i, j and k are long indexes, and value is a double.

How is it possible to first reduce the above rdd over j, and then reduce
the results over i?

Just to clarify, a scala equivalent would look like this:

var results = 0
for (i <- 0 until I) {
  var jReduction = 0
  for (j <- 0 until J) {
    *// Reduce over j*
    jReduction = jReduction + rdd(i,j)
  }
  *// Reduce over i*
  results = results * jReductions(i)
}

Reply via email to