cogroup could be useful to you, since all three are PairRDD's.
https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.PairRDDFunctions
Best Regards,
Praveen
On 01.12.2015 10:47, Shams ul Haque wrote:
Hi All,
I have made 3 RDDs of 3 different dataset, all RDDs are grouped by
CustomerID in which 2 RDDs have value of Iterable type and one has
signle bean. All RDDs have id of Long type as CustomerId. Below are
the model for 3 RDDs:
JavaPairRDD<Long, Iterable<TransactionInfo>>
JavaPairRDD<Long, Iterable<TransactionRaw>>
JavaPairRDD<Long, TransactionAggr>
Now, i have to merge all these 3 RDDs as signle one so that i can
generate excel report as per each customer by using data in 3 RDDs.
As i tried to using Join Transformation but it needs RDDs of same type
and it works for two RDDs.
So my questions is,
1. is there any way to done my merging task efficiently, so that i can
get all 3 dataset by CustomerId?
2. If i merge 1st two using Join Transformation, then do i need to run
groupByKey() before Join so that all data related to single customer
will be on one node?
Thanks
Shams
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org