Re: Re: reading a specific key-value

Mark Hamstra Fri, 13 Dec 2013 12:20:13 -0800

It means that the partitioner (Option[Partitioner]) field of the RDD is
Some(p), not None.  Which, in turn, means that for a key k, the RDD knows
how to find which partition contains that k.


In order for that to be true, the RDD has to have been partitioned by key,
and after that only partition-preserving transformations can have been
performed on the RDD[(K,V)].


On Fri, Dec 13, 2013 at 12:07 PM, Yadid Ayzenberg <[email protected]>wrote:

>  oops, ,meant to send to the entire list...
>
>
> -------- Original Message --------  Subject: Re: reading a specific
> key-value  Date: Fri, 13 Dec 2013 14:56:22 -0500  From: Yadid Ayzenberg
> <[email protected]> <[email protected]>  To: K. Shankari
> <[email protected]> <[email protected]>
>
> Its says more efficient if the RDD has a "known" partitioner. What does
> that mean?
>
> Yadid
>
>
>
> On 12/13/13 2:11 PM, K. Shankari wrote:
>
>  I think that you want the lookup() method in PairRDDFunctions?
>
> http://spark.incubator.apache.org/docs/latest/api/core/index.html#org.apache.spark.rdd.PairRDDFunctions
>
>  It is supposed to be more efficient than filter...
>
> Shankari
>
>
>
> On Thu, Dec 12, 2013 at 7:30 PM, Yadid <[email protected]> wrote:
>
>> I have a pairRDD and I would like to access a specific key-value.
>> The first thing that comes to mind is filtering using the specified key,
>> but that seems very inefficient as that would iterate over the entire RDD.
>> And even more so if I need to access several keys.
>>
>> Is there any other way to perform this ? this seems like a really useful
>> feature. Im guessing that in order to implement this, I would need a
>> mapping of keys to partitions, and a method to access data from a specific
>> partition.
>>
>> Yadid
>>
>>
>
>
>
>

Re: Re: reading a specific key-value

Reply via email to