I think you should be able to join different rdds with same key. Have you tried that? On Dec 1, 2015 3:30 PM, "Praveen Chundi" <mail.chu...@gmail.com> wrote:
> cogroup could be useful to you, since all three are PairRDD's. > > > https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.PairRDDFunctions > > Best Regards, > Praveen > > > On 01.12.2015 10:47, Shams ul Haque wrote: > >> Hi All, >> >> I have made 3 RDDs of 3 different dataset, all RDDs are grouped by >> CustomerID in which 2 RDDs have value of Iterable type and one has signle >> bean. All RDDs have id of Long type as CustomerId. Below are the model for >> 3 RDDs: >> JavaPairRDD<Long, Iterable<TransactionInfo>> >> JavaPairRDD<Long, Iterable<TransactionRaw>> >> JavaPairRDD<Long, TransactionAggr> >> >> Now, i have to merge all these 3 RDDs as signle one so that i can >> generate excel report as per each customer by using data in 3 RDDs. >> As i tried to using Join Transformation but it needs RDDs of same type >> and it works for two RDDs. >> So my questions is, >> 1. is there any way to done my merging task efficiently, so that i can >> get all 3 dataset by CustomerId? >> 2. If i merge 1st two using Join Transformation, then do i need to run >> groupByKey() before Join so that all data related to single customer will >> be on one node? >> >> >> Thanks >> Shams >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >