Re: Does spark RDD has a partitionedByKey

guojc Fri, 15 Nov 2013 09:56:57 -0800

Hi Meisam,
    Thank you for response. I know each rdd has a partitioner. What I want
to achieved here is re-partition a piece of data according to my custom
partitioner. Currently I do that by
groupByKey(myPartitioner).flatMapValues( x=>x). But I'm a bit worried
whether this will create additional temp object collection, as result is
first made into Seq the an collection of tupples. Any suggestion?


Best Regards,
Jiahcheng Guo


On Sat, Nov 16, 2013 at 12:24 AM, Meisam Fathi <[email protected]>wrote:

> Hi Jiacheng,
>
> Each RDD has a partitioner. You can define your own partitioner if the
> default partitioner does not suit your purpose.
> You can take a look at this
>
> http://ampcamp.berkeley.edu/wp-content/uploads/2012/06/matei-zaharia-amp-camp-2012-advanced-spark.pdf
> .
>
> Thanks,
> Meisam
>
> On Fri, Nov 15, 2013 at 6:54 AM, guojc <[email protected]> wrote:
> > Hi,
> >   I'm wondering whether spark rdd can has a partitionedByKey function?
> The
> > use of this function is to have a rdd distributed by according to a
> cerntain
> > paritioner and cache it. And then further join performance by rdd with
> same
> > partitoner will a great speed up. Currently, we only have a
> > groupByKeyFunction and generate a Seq of desired type , which is not very
> > convenient.
> >
> > Btw, Sorry for last empty body email. I mistakenly hit the send shortcut.
> >
> >
> > Best Regards,
> > Jiacheng Guo
>

Re: Does spark RDD has a partitionedByKey

Reply via email to