Re: distributing Scala Map datatypes to RDD
Wow, it really was that easy! The implicit joining works a treat. Many thanks, Jon On 13 October 2014 22:58, Stephen Boesch java...@gmail.com wrote: is the following what you are looking for? scala sc.parallelize(myMap.map{ case (k,v) = (k,v) }.toSeq) res2: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[0] at parallelize at console:21 2014-10-13 14:02 GMT-07:00 jon.g.massey jon.g.mas...@gmail.com: Hi guys, Just starting out with Spark and following through a few tutorials, it seems the easiest way to get ones source data into an RDD is using the sc.parallelize function. Unfortunately, my local data is in multiple instances of MapK,V types, and the parallelize function only works on objects with the Seq trait, and produces an RDD which seemingly doesn't then have the notion of Keys and Values which I require for joins (amongst other functions). Is there a way of using a SparkContext to create a distributed RDD from a local Map, rather than from a Hadoop or text file source? Thanks, Jon -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/distributing-Scala-Map-datatypes-to-RDD-tp16320.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: distributing Scala Map datatypes to RDD
is the following what you are looking for? scala sc.parallelize(myMap.map{ case (k,v) = (k,v) }.toSeq) res2: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[0] at parallelize at console:21 2014-10-13 14:02 GMT-07:00 jon.g.massey jon.g.mas...@gmail.com: Hi guys, Just starting out with Spark and following through a few tutorials, it seems the easiest way to get ones source data into an RDD is using the sc.parallelize function. Unfortunately, my local data is in multiple instances of MapK,V types, and the parallelize function only works on objects with the Seq trait, and produces an RDD which seemingly doesn't then have the notion of Keys and Values which I require for joins (amongst other functions). Is there a way of using a SparkContext to create a distributed RDD from a local Map, rather than from a Hadoop or text file source? Thanks, Jon -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/distributing-Scala-Map-datatypes-to-RDD-tp16320.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: distributing Scala Map datatypes to RDD
Map.toSeq already does that even. You can skip the map. You can put together Maps with ++ too. You should have an RDD of pairs then, but to get the special RDD functions you're looking for remember to import SparkContext._ On Mon, Oct 13, 2014 at 10:58 PM, Stephen Boesch java...@gmail.com wrote: is the following what you are looking for? scala sc.parallelize(myMap.map{ case (k,v) = (k,v) }.toSeq) res2: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[0] at parallelize at console:21 2014-10-13 14:02 GMT-07:00 jon.g.massey jon.g.mas...@gmail.com: Hi guys, Just starting out with Spark and following through a few tutorials, it seems the easiest way to get ones source data into an RDD is using the sc.parallelize function. Unfortunately, my local data is in multiple instances of MapK,V types, and the parallelize function only works on objects with the Seq trait, and produces an RDD which seemingly doesn't then have the notion of Keys and Values which I require for joins (amongst other functions). Is there a way of using a SparkContext to create a distributed RDD from a local Map, rather than from a Hadoop or text file source? Thanks, Jon -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/distributing-Scala-Map-datatypes-to-RDD-tp16320.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org