zipWithIndex is a Scala collection method, and also implemented on RDDs. You can use map transform what you have to what you want -- effectively "selecting" out the things you need.
As Nathan notes this literal join approach might not be the fastest thing but it should work. -- Sean Owen | Director, Data Science | London On Fri, Feb 14, 2014 at 4:47 PM, Sonal Goyal <sonalgoy...@gmail.com> wrote: > Thanks Sean. Is zipWtihIndex available in the Java API? Also, how do I > remove the generated id from further processing? > > Best Regards, > Sonal > Nube Technologies > > > > > > > On Fri, Feb 14, 2014 at 9:14 PM, Sean Owen <so...@cloudera.com> wrote: >> >> You could do a zipWithIndex to add a sort of "row ID" to each element >> of the input RDD. Then after self-joining, exclude elements whose row >> ID is the same. >> -- >> Sean Owen | Director, Data Science | London >> >> >> On Fri, Feb 14, 2014 at 3:42 PM, Sonal Goyal <sonalgoy...@gmail.com> >> wrote: >> > Hi, >> > >> > I have some PairRDDs like >> > >> > K1 A >> > K1 B >> > K1 C >> > >> > K2 D >> > K2 D >> > K2 E >> > >> > and I want to create >> > >> > A B >> > A C >> > B C >> > D D >> > D E >> > >> > Whats the best way to do this? If I join the RDD with itself, I will end >> > up >> > with A A which I do not want. I cant do distinct as that will filter out >> > the >> > D D which I want. >> > >> > Any pointers? Thanks. >> > >> > Best Regards, >> > Sonal >> > Nube Technologies >> > >> > >> > >> > > >