Wow, it really was that easy! The implicit joining works a treat.

Many thanks,
Jon

On 13 October 2014 22:58, Stephen Boesch <java...@gmail.com> wrote:

> is the following what you are looking for?
>
>
> scala >  sc.parallelize(myMap.map{ case (k,v) => (k,v) }.toSeq)
> res2: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[0]
> at parallelize at <console>:21
>
>
>
> 2014-10-13 14:02 GMT-07:00 jon.g.massey <jon.g.mas...@gmail.com>:
>
>> Hi guys,
>> Just starting out with Spark and following through a few tutorials, it
>> seems
>> the easiest way to get ones source data into an RDD is using the
>> sc.parallelize function. Unfortunately, my local data is in multiple
>> instances of Map<K,V> types, and the parallelize function only works on
>> objects with the Seq trait, and produces an RDD which seemingly doesn't
>> then
>> have the notion of Keys and Values which I require for joins (amongst
>> other
>> functions).
>>
>> Is there a way of using a SparkContext to create a distributed RDD from a
>> local Map, rather than from a Hadoop or text file source?
>>
>> Thanks,
>> Jon
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/distributing-Scala-Map-datatypes-to-RDD-tp16320.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>

Reply via email to