Thanks, I understand. Im using Java newAPIHadoopRDD. It seems that there
is now way to define that partitioner when creating the RDD, correct?
Does this mean I have to call partitionBy ? It seems like it would be a
lot more efficient to e able to define the partitioner on RDD creation.
Yadid
On 12/13/13 3:19 PM, Mark Hamstra wrote:
It means that the partitioner (Option[Partitioner]) field of the RDD
is Some(p), not None. Which, in turn, means that for a key k, the RDD
knows how to find which partition contains that k.
In order for that to be true, the RDD has to have been partitioned by
key, and after that only partition-preserving transformations can have
been performed on the RDD[(K,V)].
On Fri, Dec 13, 2013 at 12:07 PM, Yadid Ayzenberg <ya...@media.mit.edu
<mailto:ya...@media.mit.edu>> wrote:
oops, ,meant to send to the entire list...
-------- Original Message --------
Subject: Re: reading a specific key-value
Date: Fri, 13 Dec 2013 14:56:22 -0500
From: Yadid Ayzenberg <ya...@media.mit.edu>
<mailto:ya...@media.mit.edu>
To: K. Shankari <shank...@eecs.berkeley.edu>
<mailto:shank...@eecs.berkeley.edu>
Its says more efficient if the RDD has a "known" partitioner. What
does that mean?
Yadid
On 12/13/13 2:11 PM, K. Shankari wrote:
I think that you want the lookup() method in PairRDDFunctions?
http://spark.incubator.apache.org/docs/latest/api/core/index.html#org.apache.spark.rdd.PairRDDFunctions
It is supposed to be more efficient than filter...
Shankari
On Thu, Dec 12, 2013 at 7:30 PM, Yadid <ya...@media.mit.edu
<mailto:ya...@media.mit.edu>> wrote:
I have a pairRDD and I would like to access a specific key-value.
The first thing that comes to mind is filtering using the
specified key, but that seems very inefficient as that would
iterate over the entire RDD. And even more so if I need to
access several keys.
Is there any other way to perform this ? this seems like a
really useful feature. Im guessing that in order to implement
this, I would need a mapping of keys to partitions, and a
method to access data from a specific partition.
Yadid