Re: distributing Scala Map datatypes to RDD

2014-10-16 Thread Jon Massey
Wow, it really was that easy! The implicit joining works a treat.

Many thanks,
Jon

On 13 October 2014 22:58, Stephen Boesch java...@gmail.com wrote:

 is the following what you are looking for?


 scala   sc.parallelize(myMap.map{ case (k,v) = (k,v) }.toSeq)
 res2: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[0]
 at parallelize at console:21



 2014-10-13 14:02 GMT-07:00 jon.g.massey jon.g.mas...@gmail.com:

 Hi guys,
 Just starting out with Spark and following through a few tutorials, it
 seems
 the easiest way to get ones source data into an RDD is using the
 sc.parallelize function. Unfortunately, my local data is in multiple
 instances of MapK,V types, and the parallelize function only works on
 objects with the Seq trait, and produces an RDD which seemingly doesn't
 then
 have the notion of Keys and Values which I require for joins (amongst
 other
 functions).

 Is there a way of using a SparkContext to create a distributed RDD from a
 local Map, rather than from a Hadoop or text file source?

 Thanks,
 Jon



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/distributing-Scala-Map-datatypes-to-RDD-tp16320.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org





Re: distributing Scala Map datatypes to RDD

2014-10-13 Thread Stephen Boesch
is the following what you are looking for?


scala   sc.parallelize(myMap.map{ case (k,v) = (k,v) }.toSeq)
res2: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[0] at
parallelize at console:21



2014-10-13 14:02 GMT-07:00 jon.g.massey jon.g.mas...@gmail.com:

 Hi guys,
 Just starting out with Spark and following through a few tutorials, it
 seems
 the easiest way to get ones source data into an RDD is using the
 sc.parallelize function. Unfortunately, my local data is in multiple
 instances of MapK,V types, and the parallelize function only works on
 objects with the Seq trait, and produces an RDD which seemingly doesn't
 then
 have the notion of Keys and Values which I require for joins (amongst other
 functions).

 Is there a way of using a SparkContext to create a distributed RDD from a
 local Map, rather than from a Hadoop or text file source?

 Thanks,
 Jon



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/distributing-Scala-Map-datatypes-to-RDD-tp16320.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: distributing Scala Map datatypes to RDD

2014-10-13 Thread Sean Owen
Map.toSeq already does that even. You can skip the map. You can put
together Maps with ++ too. You should have an RDD of pairs then, but
to get the special RDD functions you're looking for remember to import
SparkContext._

On Mon, Oct 13, 2014 at 10:58 PM, Stephen Boesch java...@gmail.com wrote:
 is the following what you are looking for?


 scala   sc.parallelize(myMap.map{ case (k,v) = (k,v) }.toSeq)
 res2: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[0] at
 parallelize at console:21



 2014-10-13 14:02 GMT-07:00 jon.g.massey jon.g.mas...@gmail.com:

 Hi guys,
 Just starting out with Spark and following through a few tutorials, it
 seems
 the easiest way to get ones source data into an RDD is using the
 sc.parallelize function. Unfortunately, my local data is in multiple
 instances of MapK,V types, and the parallelize function only works on
 objects with the Seq trait, and produces an RDD which seemingly doesn't
 then
 have the notion of Keys and Values which I require for joins (amongst
 other
 functions).

 Is there a way of using a SparkContext to create a distributed RDD from a
 local Map, rather than from a Hadoop or text file source?

 Thanks,
 Jon



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/distributing-Scala-Map-datatypes-to-RDD-tp16320.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org