yes, I understand that I lost the original reference to my RDD.
my questions is regarding the new reference (made by tempRDD). I should
have 2 references to two different points in the lineage chain.
One is myRDD and the other is tempRDD. it seems that after running the
transformation referred to by tempRDD, I have altered myRDD as well.
(or at least that's what I think is happening). When I try to run an
additional transformation, it seems that that myRDD is now pointing to a
point in the lineage which is after the 2nd transformation.
Yadid
On 12/9/13 2:17 PM, Mark Hamstra wrote:
Neither map nor mapPartitions mutates an RDD -- if you establish an
immutable reference to an rdd (e.g., in Scala, val rdd = ...), the
data contained within that RDD will be the same regardless of any map
or mapPartition transformations. However, if you re-assign the
reference to point to the transformed RDD (as you do with myRDD =
myRDD.mapPartitions(...)), then you've lost the reference to the
original (un-mutated) state of the RDD and have only a reference to
the RDD-with-tranformation-applied. That doesn't make the RDD mutable
nor does it make either map or mapPartitions a side-effecting mutator
-- you've just changed where in a lineage of transformations you are
pointing to with your mutable myRDD reference.
On Mon, Dec 9, 2013 at 11:06 AM, Yadid Ayzenberg <[email protected]
<mailto:[email protected]>> wrote:
Hi all,
Im noticing some strange behavior when running mapPartitions.
Pseudo code:
JavaPairRDD<Object, Tuple2<Object, BSONObject>> myRDD =
myRDD.mapPartitions( func )
myRDD.count()
ArrayList<Tuple2<Integer, Tuple2<List<Tuple2<Double, Double>>,
List<Tuple2<Double, Double>>>>>tempRDD = myRDD.mapPartitions(func2 )
tempRDD.count()
JavaPairRDD<Object, Tuple2<Object, BSONObject>> myRDD =
myRDD.mapPartitions( func )
It seems that mapPartitions has side-effects. When I try running
the last line - its seems that contents of myRDD have been changed
by the previous map. I thought the RDD were immutable and that It
was only possible to generate new RDDs using map. Is this incorrect?
Thanks,
Yadid