my guess is you need to use a map for this. foreach is for side-effects and i am not sure if changing the object itself is an expected use. also, the objects are supposed to be immutable, your's isn't.
On Tue, Nov 5, 2013 at 4:40 PM, Hao REN <[email protected]> wrote: > Hi, > > Just a quick question: > > When playing Spark with my toy code as below, I get some unexpected > results. > > > *case class A(var a: Int) {* > * def setA() = { a = 100 }* > *}* > > *val as = sc.parallelize(List(A(1), A(2))) // it is a RDD[A]* > > > *as.foreach(_.setA())* > > *as.collect // it gives Array[this.A] = Array(A(1), A(2))* > > > The result expected is Array(A(100), A(100)). I am just trying to update > the content of the objects of A which reside in RDD. > > 1) Does the foreach do the right thing ? > 2) Which is the best way to update the object in RDD, use 'map' instead ? > > Thank you. > > Hao > > -- > REN Hao > > Data Engineer @ ClaraVista > > Paris, France > > Tel: +33 06 14 54 57 24 >
