Hi Jiacheng,

Each RDD has a partitioner. You can define your own partitioner if the
default partitioner does not suit your purpose.
You can take a look at this
http://ampcamp.berkeley.edu/wp-content/uploads/2012/06/matei-zaharia-amp-camp-2012-advanced-spark.pdf.

Thanks,
Meisam

On Fri, Nov 15, 2013 at 6:54 AM, guojc <[email protected]> wrote:
> Hi,
>   I'm wondering whether spark rdd can has a partitionedByKey function? The
> use of this function is to have a rdd distributed by according to a cerntain
> paritioner and cache it. And then further join performance by rdd with same
> partitoner will a great speed up. Currently, we only have a
> groupByKeyFunction and generate a Seq of desired type , which is not very
> convenient.
>
> Btw, Sorry for last empty body email. I mistakenly hit the send shortcut.
>
>
> Best Regards,
> Jiacheng Guo

Reply via email to