Hi! There is RDD.cartesian(), which creates the Cartiesian product of two
RDDs. You could do data.cartesian(data) to get an RDD of all pairs of
lines. It will be of length data.count * data.count of course.



On Wed, Apr 23, 2014 at 4:48 PM, Jared Rodriguez <jrodrig...@kitedesk.com>wrote:

> Hi there,
>
> I am new to Spark and new to scala, although have lots of experience on
> the Java side.  I am experimenting with Spark for a new project where it
> seems like it could be a good fit.  As I go through the examples, there is
> one case scenario that I am trying to figure out, comparing the contents of
> an RDD to itself to result in a new RDD.
>
> In an overly simply example, I have:
>
> JavaSparkContext sc = new JavaSparkContext ...
> JavaRDD<String> data = sc.parallelize(buildData());
>
> I then want to compare each entry in data to other entries and end up with:
>
> JavaPairRDD<String, List<String>> mapped = data.???
>
> Is this something easily handled by Spark?  My apologies if this is a
> stupid question, I have spent less than 10 hours tinkering with Spark and
> am trying to come up to speed.
>
>
> --
> Jared Rodriguez
>
>

Reply via email to