Neither map nor mapPartitions mutates an RDD -- if you establish an
immutable reference to an rdd (e.g., in Scala,  val rdd = ...), the data
contained within that RDD will be the same regardless of any map or
mapPartition transformations.  However, if you re-assign the reference to
point to the transformed RDD (as you do with myRDD =
myRDD.mapPartitions(...)), then you've lost the reference to the original
(un-mutated) state of the RDD and have only a reference to the
RDD-with-tranformation-applied.  That doesn't make the RDD mutable nor does
it make either map or mapPartitions a side-effecting mutator -- you've just
changed where in a lineage of transformations you are pointing to with your
mutable myRDD reference.


On Mon, Dec 9, 2013 at 11:06 AM, Yadid Ayzenberg <[email protected]>wrote:

>
> Hi all,
>
> Im noticing some strange behavior when running mapPartitions. Pseudo code:
>
> JavaPairRDD<Object, Tuple2<Object, BSONObject>> myRDD =
> myRDD.mapPartitions( func )
>
> myRDD.count()
>
> ArrayList<Tuple2<Integer, Tuple2<List<Tuple2<Double, Double>>,
> List<Tuple2<Double, Double>>>>>tempRDD = myRDD.mapPartitions(func2 )
>
> tempRDD.count()
>
>
> JavaPairRDD<Object, Tuple2<Object, BSONObject>> myRDD =
> myRDD.mapPartitions( func )
>
>
> It seems that mapPartitions has side-effects. When I try running the last
> line - its seems that contents of myRDD have been changed by the previous
> map. I thought the RDD were immutable and that It was only possible to
> generate new RDDs using map. Is this incorrect?
>
>
> Thanks,
> Yadid
>
>

Reply via email to