Rdd.foreach runs in the executors. You should use `collect` to fetch data to the driver. E.g.,
myRdd.collect().foreach { node => { mp(node) = 1 } } Best Regards, Shixiong Zhu 2015-02-25 4:00 GMT+08:00 Vijayasarathy Kannan <kvi...@vt.edu>: > Thanks, but it still doesn't seem to work. > > Below is my entire code. > > var mp = scala.collection.mutable.Map[VertexId, Int]() > > var myRdd = graph.edges.groupBy[VertexId](f).flatMap { > edgesBySrc => func(edgesBySrc, a, b) > } > > myRdd.foreach { > node => { > mp(node) = 1 > } > } > > Values in "mp" do not get updated for any element in "myRdd". > > On Tue, Feb 24, 2015 at 2:39 PM, Sean Owen <so...@cloudera.com> wrote: > >> Instead of >> >> ...foreach { >> edgesBySrc => { >> lst ++= func(edgesBySrc) >> } >> } >> >> try >> >> ...flatMap { edgesBySrc => func(edgesBySrc) } >> >> or even more succinctly >> >> ...flatMap(func) >> >> This returns an RDD that basically has the list you are trying to >> build, I believe. >> >> You can collect() to the driver but beware if it is a huge data set. >> >> If you really just mean to count the results, you can count() instead >> >> On Tue, Feb 24, 2015 at 7:35 PM, Vijayasarathy Kannan <kvi...@vt.edu> >> wrote: >> > I am a beginner to Scala/Spark. Could you please elaborate on how to >> make >> > RDD of results of func() and collect? >> > >> > >> > On Tue, Feb 24, 2015 at 2:27 PM, Sean Owen <so...@cloudera.com> wrote: >> >> >> >> They aren't the same 'lst'. One is on your driver. It gets copied to >> >> executors when the tasks are executed. Those copies are updated. But >> >> the updates will never reflect in the local copy back in the driver. >> >> >> >> You may just wish to make an RDD of the results of func() and >> >> collect() them back to the driver. >> >> >> >> On Tue, Feb 24, 2015 at 7:20 PM, kvvt <kvi...@vt.edu> wrote: >> >> > I am working on the below piece of code. >> >> > >> >> > var lst = scala.collection.mutable.MutableList[VertexId]() >> >> > graph.edges.groupBy[VertexId](f).foreach { >> >> > edgesBySrc => { >> >> > lst ++= func(edgesBySrc) >> >> > } >> >> > } >> >> > >> >> > println(lst.length) >> >> > >> >> > Here, the final println() always says that the length of the list is >> 0. >> >> > The >> >> > list is non-empty (correctly prints the length of the returned list >> >> > inside >> >> > func()). >> >> > >> >> > I am not sure if I am doing the append correctly. Can someone point >> out >> >> > what >> >> > I am doing wrong? >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > -- >> >> > View this message in context: >> >> > >> http://apache-spark-user-list.1001560.n3.nabble.com/Not-able-to-update-collections-tp21790.html >> >> > Sent from the Apache Spark User List mailing list archive at >> Nabble.com. >> >> > >> >> > --------------------------------------------------------------------- >> >> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> >> > For additional commands, e-mail: user-h...@spark.apache.org >> >> > >> > >> > >> > >