I have had to use this as well.

Sometimes, I create a POJO to hold the multi-dimensional key to make things
easier.

ie.
class MultiKey(i, j, k) {
}

then I can define a reduce function that is over the multikey, e.g.

def reduceByI(mkv1: (MultiKey, Value), mkv2: (MultiKey: Value)) = if
(mkv1.i > mkv2.i) v1 else v2

and then I can do

rdd.reduce(reduceByI)

Thanks,
Shankari


On Thu, Jan 2, 2014 at 3:28 PM, Andrew Ash <[email protected]> wrote:

> If you had RDD[[i, j, k], value] then you could reduce by j by essentially
> mapping j into the key slot, doing the reduce, and then mapping it back:
>
> rdd.map( ((i,j,k),v) => (j, (i, k, v)).reduce( ... ).map( (j,(i,k,v)) =>
> ((i,j,k),v))
>
> It's not pretty, but I've had to use this pattern before too.
>
>
> On Thu, Jan 2, 2014 at 6:23 PM, Aureliano Buendia <[email protected]>wrote:
>
>> Hi,
>>
>> How is it possible to reduce by multidimensional keys?
>>
>> For example, if every line is a tuple like:
>>
>> (i, j, k, value)
>>
>> or, alternatively:
>>
>> ((I, j, k), value)
>>
>> how can spark handle reducing over j, or k?
>>
>
>

Reply via email to