Hi! There is RDD.cartesian(), which creates the Cartiesian product of two RDDs. You could do data.cartesian(data) to get an RDD of all pairs of lines. It will be of length data.count * data.count of course.
On Wed, Apr 23, 2014 at 4:48 PM, Jared Rodriguez <jrodrig...@kitedesk.com>wrote: > Hi there, > > I am new to Spark and new to scala, although have lots of experience on > the Java side. I am experimenting with Spark for a new project where it > seems like it could be a good fit. As I go through the examples, there is > one case scenario that I am trying to figure out, comparing the contents of > an RDD to itself to result in a new RDD. > > In an overly simply example, I have: > > JavaSparkContext sc = new JavaSparkContext ... > JavaRDD<String> data = sc.parallelize(buildData()); > > I then want to compare each entry in data to other entries and end up with: > > JavaPairRDD<String, List<String>> mapped = data.??? > > Is this something easily handled by Spark? My apologies if this is a > stupid question, I have spent less than 10 hours tinkering with Spark and > am trying to come up to speed. > > > -- > Jared Rodriguez > >