You should use join:
val rdd1 = sc.parallelize(List((1,(3)), (2,(5)), (3,(6))))
val rdd2 = sc.parallelize(List((2,(1)), (2,(3)), (3,(9))))

rdd1.join(rdd2).collect
res0: Array[(Int, (Int, Int))] = Array((2,(5,1)), (2,(5,3)), (3,(6,9)))

Please see my cheat sheet at
* 3.14 join(otherDataset, [numTasks])    *
http://www.openkb.info/2015/01/scala-on-spark-cheatsheet.html


On Wed, Feb 4, 2015 at 3:52 PM, dash <[email protected]> wrote:

> Hey Spark gurus! Sorry for the confusing title. I do not know the exactly
> description of my problem, if you know please tell me so I can change it
> :-)
>
> Say I have two RDDs right now, and they are
>
> val rdd1 = sc.parallelize(List((1,(3)), (2,(5)), (3,(6))))
> val rdd2 = sc.parallelize(List((2,(1)), (2,(3)), (3,(9))))
>
> I want combine rdd1 and rdd2 to get rdd3 which looks like
>
> List((1,(3)), (2,(5,1)), (2,(5,3)), (3, (6,9)))
>
> The order in _._2 does not matter, so you can treat it as a Set.
>
> I tried to use zip, but since there is no guarantee that the length of rdd1
> and rdd2 will be the same I do not know if it is doable.
>
> Also I looked into PairedRDD, some people use union operation on two RDDs
> and then apply a map function on it. Since I want all combinations
> according
> to _._1, I do not know how to achieve it by union and map.
>
> Thanks in advance!
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/New-combination-like-RDD-based-on-two-RDDs-tp21508.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>


-- 
Thanks,
www.openkb.info
(Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)

Reply via email to