How can I apply such an inner join in Spark Scala/Python

Blind Faith Mon, 17 Nov 2014 09:54:18 -0800

So let us say I have RDDs A and B with the following values.

A = [ (1, 2), (2, 4), (3, 6) ]


B = [ (1, 3), (2, 5), (3, 6), (4, 5), (5, 6) ]

I want to apply an inner join, such that I get the following as a result.

C = [ (1, (2, 3)), (2, (4, 5)), (3, (6,6)) ]

That is, those keys which are not present in A should disappear after the
left inner join.

How can I achieve that? I can see outerJoin functions but no innerJoin
functions in the Spark RDD class.

How can I apply such an inner join in Spark Scala/Python

Reply via email to