I've similar type of issue, want to join two different type of RDD in one RDD
file1.txt content (ID, counts) val x : RDD[Long, Int] = sc.textFile("file1.txt").map( line => line.split(",")).map(row => (row(0).toLong, row(1).toInt) [(4407 ,40), (2064, 38), (7815 ,10), (5736,17), (8031,3)] Second RDD from : file2.txt contains (ID, name) val y: RDD[(Long, String)] {where ID is common in both the RDDs} [(4407 ,Jhon), (2064, Maria), (7815 ,Casto), (5736,Ram), (8031,XYZ)] and I'm expecting result should be like this : [(ID, Name, Count)] [(4407 ,Jhon, 40), (2064, Maria, 38), (7815 ,Casto, 10), (5736,Ram, 17), (8031,XYZ, 3)] Any help will really appreciate. Thanks On 21 November 2014 09:18, dsiegmann [via Apache Spark User List] < ml-node+s1001560n19419...@n3.nabble.com> wrote: > You want to use RDD.union (or SparkContext.union for many RDDs). These > don't join on a key. Union doesn't really do anything itself, so it is low > overhead. Note that the combined RDD will have all the partitions of the > original RDDs, so you may want to coalesce after the union. > > val x = sc.parallelize(Seq( (1, 3), (2, 4) )) > val y = sc.parallelize(Seq( (3, 5), (4, 7) )) > val z = x.union(y) > > z.collect > res0: Array[(Int, Int)] = Array((1,3), (2,4), (3,5), (4,7)) > > > On Thu, Nov 20, 2014 at 3:06 PM, Blind Faith <[hidden email] > <http://user/SendEmail.jtp?type=node&node=19419&i=0>> wrote: > >> Say I have two RDDs with the following values >> >> x = [(1, 3), (2, 4)] >> >> and >> >> y = [(3, 5), (4, 7)] >> >> and I want to have >> >> z = [(1, 3), (2, 4), (3, 5), (4, 7)] >> >> How can I achieve this. I know you can use outerJoin followed by map to >> achieve this, but is there a more direct way for this. >> > > > > -- > Daniel Siegmann, Software Developer > Velos > Accelerating Machine Learning > > 54 W 40th St, New York, NY 10018 > E: [hidden email] <http://user/SendEmail.jtp?type=node&node=19419&i=1> W: > www.velos.io > > > ------------------------------ > If you reply to this email, your message will be added to the discussion > below: > > http://apache-spark-user-list.1001560.n3.nabble.com/How-to-join-two-RDDs-with-mutually-exclusive-keys-tp19417p19419.html > To start a new topic under Apache Spark User List, email > ml-node+s1001560n1...@n3.nabble.com > To unsubscribe from Apache Spark User List, click here > <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=aG5haGFrQHd5bnlhcmRncm91cC5jb218MXwtMTgxOTE5MTkyOQ==> > . > NAML > <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> > -- Regards, Harihar Nahak BigData Developer Wynyard Email:hna...@wynyardgroup.com | Extn: 8019 ----- --Harihar -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-join-two-RDDs-with-mutually-exclusive-keys-tp19417p19423.html Sent from the Apache Spark User List mailing list archive at Nabble.com.