Re: How to create RDD over hashmap?

Andrew Ash Fri, 24 Jan 2014 13:17:28 -0800

If you have a pair RDD (an RDD[A,B]) then you can use the .lookup() method
on it for faster access.


http://spark.incubator.apache.org/docs/latest/api/core/index.html#org.apache.spark.rdd.PairRDDFunctions

Spark's strength is running computations across a large set of data.  If
you're trying to do fast lookup of a few individual keys, I'd recommend
something more like memcached or Elasticsearch.


On Fri, Jan 24, 2014 at 1:11 PM, Manoj Samel <[email protected]>wrote:

> Yes, that works.
>
> But then the hashmap functionality of the fast key lookup etc. is gone and
> the search will be linear using a iterator etc. Not sure if Spark
> internally creates additional optimizations for Seq but otherwise one has
> to assume this becomes a List/Array without a fast key lookup of a hashmap
> or b-tree
>
> Any thoughts ?
>
>
>
>
>
> On Fri, Jan 24, 2014 at 1:00 PM, Frank Austin Nothaft <
> [email protected]> wrote:
>
>> Manoj,
>>
>> I assume you’re trying to create an RDD[(String, Double)]? Couldn’t you
>> just do:
>>
>> val cr_rdd = sc.parallelize(cr.toSeq)
>>
>> The toSeq would convert the HashMap[String,Double] into a Seq[(String,
>> Double)] before calling the parallelize function.
>>
>> Regards,
>>
>> Frank Austin Nothaft
>> [email protected]
>> [email protected]
>> 202-340-0466
>>
>> On Jan 24, 2014, at 12:56 PM, Manoj Samel <[email protected]>
>> wrote:
>>
>> > Is there a way to create RDD over a hashmap ?
>> >
>> > If I have a hash map and try sc.parallelize, it gives
>> >
>> > <console>:17: error: type mismatch;
>> >  found   : scala.collection.mutable.HashMap[String,Double]
>> >  required: Seq[?]
>> > Error occurred in an application involving default arguments.
>> >        val cr_rdd = sc.parallelize(cr)
>> >                                    ^
>>
>>
>

Re: How to create RDD over hashmap?

Reply via email to