Re: How to create RDD over hashmap?

Guillaume Pitel Fri, 24 Jan 2014 13:32:25 -0800

Related question about this kind of problems : what is the best way to get the mappings of a list of keys ?

Does this make sense ? :

val myKeys=sc.parallelize(List(("query1",None),("query2",None)))
val resolved = myKeys.leftJoin(dictionary)

Guillaume

If you have a pair RDD (an RDD[A,B]) then you can use the .lookup() method on it for faster access.

http://spark.incubator.apache.org/docs/latest/api/core/index.html#org.apache.spark.rdd.PairRDDFunctions

Spark's strength is running computations across a large set of data. If you're trying to do fast lookup of a few individual keys, I'd recommend something more like memcached or Elasticsearch.

On Fri, Jan 24, 2014 at 1:11 PM, Manoj Samel <[email protected]> wrote:

Yes, that works.

But then the hashmap functionality of the fast key lookup etc. is gone and the search will be linear using a iterator etc. Not sure if Spark internally creates additional optimizations for Seq but otherwise one has to assume this becomes a List/Array without a fast key lookup of a hashmap or b-tree

Any thoughts ?

On Fri, Jan 24, 2014 at 1:00 PM, Frank Austin Nothaft <[email protected]> wrote:

Manoj,

I assume you’re trying to create an RDD[(String, Double)]? Couldn’t you just do:

val cr_rdd = sc.parallelize(cr.toSeq)

The toSeq would convert the HashMap[String,Double] into a Seq[(String, Double)] before calling the parallelize function.

Regards,

Frank Austin Nothaft
[email protected]
[email protected]
202-340-0466

On Jan 24, 2014, at 12:56 PM, Manoj Samel <[email protected]> wrote:

> Is there a way to create RDD over a hashmap ?
>
> If I have a hash map and try sc.parallelize, it gives
>
> <console>:17: error: type mismatch;
> found : scala.collection.mutable.HashMap[String,Double]
> required: Seq[?]
> Error occurred in an application involving default arguments.
> val cr_rdd = sc.parallelize(cr)
> ^

Guillaume PITEL, Président
+33(0)6 25 48 86 80 / +33(0)9 70 44 67 53

eXenSa S.A.S.
41, rue Périer - 92120 Montrouge - FRANCE
Tel +33(0)1 84 16 36 77 / Fax +33(0)9 72 28 37 05

Re: How to create RDD over hashmap?

Reply via email to