Hi Jacek, Thanks for the suggestion, i am going to try union. And what is your opinion on 2nd question.
Thanks Shams On Tue, Dec 1, 2015 at 3:23 PM, Jacek Laskowski <ja...@japila.pl> wrote: > Hi, > > Never done it before, but just yesterday I found out about > SparkContext.union method that could help in your case. > > def union[T](rdds: Seq[RDD[T]])(implicit arg0: ClassTag[T]): RDD[T] > > > http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.SparkContext > > Pozdrawiam, > Jacek > > -- > Jacek Laskowski | https://medium.com/@jaceklaskowski/ | > http://blog.jaceklaskowski.pl > Mastering Spark https://jaceklaskowski.gitbooks.io/mastering-apache-spark/ > Follow me at https://twitter.com/jaceklaskowski > Upvote at http://stackoverflow.com/users/1305344/jacek-laskowski > > > On Tue, Dec 1, 2015 at 10:47 AM, Shams ul Haque <sham...@cashcare.in> > wrote: > > Hi All, > > > > I have made 3 RDDs of 3 different dataset, all RDDs are grouped by > > CustomerID in which 2 RDDs have value of Iterable type and one has signle > > bean. All RDDs have id of Long type as CustomerId. Below are the model > for 3 > > RDDs: > > JavaPairRDD<Long, Iterable<TransactionInfo>> > > JavaPairRDD<Long, Iterable<TransactionRaw>> > > JavaPairRDD<Long, TransactionAggr> > > > > Now, i have to merge all these 3 RDDs as signle one so that i can > generate > > excel report as per each customer by using data in 3 RDDs. > > As i tried to using Join Transformation but it needs RDDs of same type > and > > it works for two RDDs. > > So my questions is, > > 1. is there any way to done my merging task efficiently, so that i can > get > > all 3 dataset by CustomerId? > > 2. If i merge 1st two using Join Transformation, then do i need to run > > groupByKey() before Join so that all data related to single customer > will be > > on one node? > > > > > > Thanks > > Shams >