Re: Does spark RDD has a partitionedByKey

Meisam Fathi Fri, 15 Nov 2013 11:03:39 -0800

Hi guojc,

It is not cleat for me what problem you are trying to solve. What do
you want to do with the result of your
groupByKey(myPartitioner).flatMapValues( x=>x)? Do you want to use it
in a join? Do you want to save it to your file system? Or do you want
to do something else with it?


Thanks,
Meisam

On Fri, Nov 15, 2013 at 12:56 PM, guojc <[email protected]> wrote:
> Hi Meisam,
>     Thank you for response. I know each rdd has a partitioner. What I want
> to achieved here is re-partition a piece of data according to my custom
> partitioner. Currently I do that by groupByKey(myPartitioner).flatMapValues(
> x=>x). But I'm a bit worried whether this will create additional temp object
> collection, as result is first made into Seq the an collection of tupples.
> Any suggestion?
>
> Best Regards,
> Jiahcheng Guo
>
>
> On Sat, Nov 16, 2013 at 12:24 AM, Meisam Fathi <[email protected]>
> wrote:
>>
>> Hi Jiacheng,
>>
>> Each RDD has a partitioner. You can define your own partitioner if the
>> default partitioner does not suit your purpose.
>> You can take a look at this
>>
>> http://ampcamp.berkeley.edu/wp-content/uploads/2012/06/matei-zaharia-amp-camp-2012-advanced-spark.pdf.
>>
>> Thanks,
>> Meisam
>>
>> On Fri, Nov 15, 2013 at 6:54 AM, guojc <[email protected]> wrote:
>> > Hi,
>> >   I'm wondering whether spark rdd can has a partitionedByKey function?
>> > The
>> > use of this function is to have a rdd distributed by according to a
>> > cerntain
>> > paritioner and cache it. And then further join performance by rdd with
>> > same
>> > partitoner will a great speed up. Currently, we only have a
>> > groupByKeyFunction and generate a Seq of desired type , which is not
>> > very
>> > convenient.
>> >
>> > Btw, Sorry for last empty body email. I mistakenly hit the send
>> > shortcut.
>> >
>> >
>> > Best Regards,
>> > Jiacheng Guo
>
>

Re: Does spark RDD has a partitionedByKey

Reply via email to