You can simply call myKeys.collectAsMap(), which is a function in
PairRDDFunctions, but pay attention that if multiple pairs with the same
key exist in the RDD, only the final one appears in the resulted Map object.


On Sat, Jan 25, 2014 at 5:31 AM, Guillaume Pitel <[email protected]
> wrote:

>  Related question about this kind of problems : what is the best way to
> get the mappings of a list of keys ?
>
> Does this make sense ? :
>
> val myKeys=sc.parallelize(List(("query1",None),("query2",None)))
> val resolved = myKeys.leftJoin(dictionary)
>
> Guillaume
>
> If you have a pair RDD (an RDD[A,B]) then you can use the .lookup() method
> on it for faster access.
>
>
> http://spark.incubator.apache.org/docs/latest/api/core/index.html#org.apache.spark.rdd.PairRDDFunctions
>
>  Spark's strength is running computations across a large set of data.  If
> you're trying to do fast lookup of a few individual keys, I'd recommend
> something more like memcached or Elasticsearch.
>
>
> On Fri, Jan 24, 2014 at 1:11 PM, Manoj Samel <[email protected]>wrote:
>
>> Yes, that works.
>>
>>  But then the hashmap functionality of the fast key lookup etc. is gone
>> and the search will be linear using a iterator etc. Not sure if Spark
>> internally creates additional optimizations for Seq but otherwise one has
>> to assume this becomes a List/Array without a fast key lookup of a hashmap
>> or b-tree
>>
>>  Any thoughts ?
>>
>>
>>
>>
>>
>> On Fri, Jan 24, 2014 at 1:00 PM, Frank Austin Nothaft <
>> [email protected]> wrote:
>>
>>> Manoj,
>>>
>>> I assume you’re trying to create an RDD[(String, Double)]? Couldn’t you
>>> just do:
>>>
>>> val cr_rdd = sc.parallelize(cr.toSeq)
>>>
>>> The toSeq would convert the HashMap[String,Double] into a Seq[(String,
>>> Double)] before calling the parallelize function.
>>>
>>> Regards,
>>>
>>> Frank Austin Nothaft
>>> [email protected]
>>> [email protected]
>>> 202-340-0466
>>>
>>> On Jan 24, 2014, at 12:56 PM, Manoj Samel <[email protected]>
>>> wrote:
>>>
>>> > Is there a way to create RDD over a hashmap ?
>>> >
>>> > If I have a hash map and try sc.parallelize, it gives
>>> >
>>> > <console>:17: error: type mismatch;
>>> >  found   : scala.collection.mutable.HashMap[String,Double]
>>> >  required: Seq[?]
>>> > Error occurred in an application involving default arguments.
>>> >        val cr_rdd = sc.parallelize(cr)
>>> >                                    ^
>>>
>>>
>>
>
>
> --
>    [image: eXenSa]
>  *Guillaume PITEL, Président*
> +33(0)6 25 48 86 80 / +33(0)9 70 44 67 53
>
>  eXenSa S.A.S. <http://www.exensa.com/>
>  41, rue Périer - 92120 Montrouge - FRANCE
> Tel +33(0)1 84 16 36 77 / Fax +33(0)9 72 28 37 05
>

<<exensa_logo_mail.png>>

Reply via email to