Hi Jiacheng, Each RDD has a partitioner. You can define your own partitioner if the default partitioner does not suit your purpose. You can take a look at this http://ampcamp.berkeley.edu/wp-content/uploads/2012/06/matei-zaharia-amp-camp-2012-advanced-spark.pdf.
Thanks, Meisam On Fri, Nov 15, 2013 at 6:54 AM, guojc <[email protected]> wrote: > Hi, > I'm wondering whether spark rdd can has a partitionedByKey function? The > use of this function is to have a rdd distributed by according to a cerntain > paritioner and cache it. And then further join performance by rdd with same > partitoner will a great speed up. Currently, we only have a > groupByKeyFunction and generate a Seq of desired type , which is not very > convenient. > > Btw, Sorry for last empty body email. I mistakenly hit the send shortcut. > > > Best Regards, > Jiacheng Guo
