If you had RDD[[i, j, k], value] then you could reduce by j by essentially mapping j into the key slot, doing the reduce, and then mapping it back:
rdd.map( ((i,j,k),v) => (j, (i, k, v)).reduce( ... ).map( (j,(i,k,v)) => ((i,j,k),v)) It's not pretty, but I've had to use this pattern before too. On Thu, Jan 2, 2014 at 6:23 PM, Aureliano Buendia <[email protected]>wrote: > Hi, > > How is it possible to reduce by multidimensional keys? > > For example, if every line is a tuple like: > > (i, j, k, value) > > or, alternatively: > > ((I, j, k), value) > > how can spark handle reducing over j, or k? >
