Hi Jacek,

Thanks for the suggestion, i am going to try union.
And what is your opinion on 2nd question.


Thanks
Shams

On Tue, Dec 1, 2015 at 3:23 PM, Jacek Laskowski <ja...@japila.pl> wrote:

> Hi,
>
> Never done it before, but just yesterday I found out about
> SparkContext.union method that could help in your case.
>
> def union[T](rdds: Seq[RDD[T]])(implicit arg0: ClassTag[T]): RDD[T]
>
>
> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.SparkContext
>
> Pozdrawiam,
> Jacek
>
> --
> Jacek Laskowski | https://medium.com/@jaceklaskowski/ |
> http://blog.jaceklaskowski.pl
> Mastering Spark https://jaceklaskowski.gitbooks.io/mastering-apache-spark/
> Follow me at https://twitter.com/jaceklaskowski
> Upvote at http://stackoverflow.com/users/1305344/jacek-laskowski
>
>
> On Tue, Dec 1, 2015 at 10:47 AM, Shams ul Haque <sham...@cashcare.in>
> wrote:
> > Hi All,
> >
> > I have made 3 RDDs of 3 different dataset, all RDDs are grouped by
> > CustomerID in which 2 RDDs have value of Iterable type and one has signle
> > bean. All RDDs have id of Long type as CustomerId. Below are the model
> for 3
> > RDDs:
> > JavaPairRDD<Long, Iterable<TransactionInfo>>
> > JavaPairRDD<Long, Iterable<TransactionRaw>>
> > JavaPairRDD<Long, TransactionAggr>
> >
> > Now, i have to merge all these 3 RDDs as signle one so that i can
> generate
> > excel report as per each customer by using data in 3 RDDs.
> > As i tried to using Join Transformation but it needs RDDs of same type
> and
> > it works for two RDDs.
> > So my questions is,
> > 1. is there any way to done my merging task efficiently, so that i can
> get
> > all 3 dataset by CustomerId?
> > 2. If i merge 1st two using Join Transformation, then do i need to run
> > groupByKey() before Join so that all data related to single customer
> will be
> > on one node?
> >
> >
> > Thanks
> > Shams
>

Reply via email to