Re: reading a specific key-value

Yadid Sat, 14 Dec 2013 10:05:36 -0800

I think I'm missing some information. The log says: /Computing partitionorg.apache.spark.rdd.NewHadoopPartition/

What partitioner is being used for this?

Also, after creating the RDD I use JavaPairRDD.keyBy(). Im guessing thiswill re-partition the values according to the new key.

What partitioner is being used in this case?


Yadid

On 12/13/13, 4:49 PM, Yadid Ayzenberg wrote:
Thanks, I understand. Im using Java newAPIHadoopRDD. It seems thatthere is now way to define that partitioner when creating the RDD,correct?Does this mean I have to call partitionBy ? It seems like it would bea lot more efficient to e able to define the partitioner on RDD creation.
Yadid




On 12/13/13 3:19 PM, Mark Hamstra wrote:
It means that the partitioner (Option[Partitioner]) field of the RDDis Some(p), not None. Which, in turn, means that for a key k, theRDD knows how to find which partition contains that k.
In order for that to be true, the RDD has to have been partitionedby key, and after that only partition-preserving transformations canhave been performed on the RDD[(K,V)].
On Fri, Dec 13, 2013 at 12:07 PM, Yadid Ayzenberg<[email protected] <mailto:[email protected]>> wrote:
    oops, ,meant to send to the entire list...


    -------- Original Message --------
    Subject:    Re: reading a specific key-value
    Date:       Fri, 13 Dec 2013 14:56:22 -0500
    From:       Yadid Ayzenberg <[email protected]>
    <mailto:[email protected]>
    To:         K. Shankari <[email protected]>
    <mailto:[email protected]>



    Its says more efficient if the RDD has a "known" partitioner.
    What does that mean?

    Yadid



    On 12/13/13 2:11 PM, K. Shankari wrote:
    I think that you want the lookup() method in PairRDDFunctions?
    
http://spark.incubator.apache.org/docs/latest/api/core/index.html#org.apache.spark.rdd.PairRDDFunctions

    It is supposed to be more efficient than filter...

    Shankari



    On Thu, Dec 12, 2013 at 7:30 PM, Yadid <[email protected]
    <mailto:[email protected]>> wrote:

        I have a pairRDD and I would like to access a specific
        key-value.
        The first thing that comes to mind is filtering using the
        specified key, but that seems very inefficient as that
        would iterate over the entire RDD. And even more so if I
        need to access several keys.

        Is there any other way to perform this ? this seems like a
        really useful feature. Im guessing that in order to
        implement this, I would need a mapping of keys to
        partitions, and a method to access data from a specific
        partition.

        Yadid

Re: reading a specific key-value

Reply via email to