Neither map nor mapPartitions mutates an RDD -- if you establish an immutable reference to an rdd (e.g., in Scala, val rdd = ...), the data contained within that RDD will be the same regardless of any map or mapPartition transformations. However, if you re-assign the reference to point to the transformed RDD (as you do with myRDD = myRDD.mapPartitions(...)), then you've lost the reference to the original (un-mutated) state of the RDD and have only a reference to the RDD-with-tranformation-applied. That doesn't make the RDD mutable nor does it make either map or mapPartitions a side-effecting mutator -- you've just changed where in a lineage of transformations you are pointing to with your mutable myRDD reference.
On Mon, Dec 9, 2013 at 11:06 AM, Yadid Ayzenberg <[email protected]>wrote: > > Hi all, > > Im noticing some strange behavior when running mapPartitions. Pseudo code: > > JavaPairRDD<Object, Tuple2<Object, BSONObject>> myRDD = > myRDD.mapPartitions( func ) > > myRDD.count() > > ArrayList<Tuple2<Integer, Tuple2<List<Tuple2<Double, Double>>, > List<Tuple2<Double, Double>>>>>tempRDD = myRDD.mapPartitions(func2 ) > > tempRDD.count() > > > JavaPairRDD<Object, Tuple2<Object, BSONObject>> myRDD = > myRDD.mapPartitions( func ) > > > It seems that mapPartitions has side-effects. When I try running the last > line - its seems that contents of myRDD have been changed by the previous > map. I thought the RDD were immutable and that It was only possible to > generate new RDDs using map. Is this incorrect? > > > Thanks, > Yadid > >
