Hi Archit, There are a lot of nice functions for joining key-value RDDs. http://spark.incubator.apache.org/docs/latest/api/core/index.html#org.apache.spark.rdd.PairRDDFunctions
A common strategy I do is to take my two sets and factor out a common 'key' then execute one of the join functions to get the results I am looking for. It's hard to advise further on your specific case as I do not know your data structure but hope this helps. Gary On Mon, Dec 16, 2013 at 6:00 AM, Archit Thakur <[email protected]>wrote: > Hi, > > I want to read multiple paths into single RDD. > > I know I can do it this way: > sc.sequenceFile("/data/new_rdd_/*,-,-,-) > > What if they belong to different directories or may be different machines? > > Is the only way by joining two RDD . > That is reading different path into different RDD and then join all.? > > > but my real requirement is not to join all RDD but MERGE them, like > appending 2nd to 1st and so on. > > What is the best way for this? > > Thanks and Regards, > Archit Thakur. >
