Hi guojc, It is not cleat for me what problem you are trying to solve. What do you want to do with the result of your groupByKey(myPartitioner).flatMapValues( x=>x)? Do you want to use it in a join? Do you want to save it to your file system? Or do you want to do something else with it?
Thanks, Meisam On Fri, Nov 15, 2013 at 12:56 PM, guojc <[email protected]> wrote: > Hi Meisam, > Thank you for response. I know each rdd has a partitioner. What I want > to achieved here is re-partition a piece of data according to my custom > partitioner. Currently I do that by groupByKey(myPartitioner).flatMapValues( > x=>x). But I'm a bit worried whether this will create additional temp object > collection, as result is first made into Seq the an collection of tupples. > Any suggestion? > > Best Regards, > Jiahcheng Guo > > > On Sat, Nov 16, 2013 at 12:24 AM, Meisam Fathi <[email protected]> > wrote: >> >> Hi Jiacheng, >> >> Each RDD has a partitioner. You can define your own partitioner if the >> default partitioner does not suit your purpose. >> You can take a look at this >> >> http://ampcamp.berkeley.edu/wp-content/uploads/2012/06/matei-zaharia-amp-camp-2012-advanced-spark.pdf. >> >> Thanks, >> Meisam >> >> On Fri, Nov 15, 2013 at 6:54 AM, guojc <[email protected]> wrote: >> > Hi, >> > I'm wondering whether spark rdd can has a partitionedByKey function? >> > The >> > use of this function is to have a rdd distributed by according to a >> > cerntain >> > paritioner and cache it. And then further join performance by rdd with >> > same >> > partitoner will a great speed up. Currently, we only have a >> > groupByKeyFunction and generate a Seq of desired type , which is not >> > very >> > convenient. >> > >> > Btw, Sorry for last empty body email. I mistakenly hit the send >> > shortcut. >> > >> > >> > Best Regards, >> > Jiacheng Guo > >
