yes, I understand that I lost the original reference to my RDD.
my questions is regarding the new reference (made by tempRDD). I should have 2 references to two different points in the lineage chain. One is myRDD and the other is tempRDD. it seems that after running the transformation referred to by tempRDD, I have altered myRDD as well. (or at least that's what I think is happening). When I try to run an additional transformation, it seems that that myRDD is now pointing to a point in the lineage which is after the 2nd transformation.

Yadid


On 12/9/13 2:17 PM, Mark Hamstra wrote:
Neither map nor mapPartitions mutates an RDD -- if you establish an immutable reference to an rdd (e.g., in Scala, val rdd = ...), the data contained within that RDD will be the same regardless of any map or mapPartition transformations. However, if you re-assign the reference to point to the transformed RDD (as you do with myRDD = myRDD.mapPartitions(...)), then you've lost the reference to the original (un-mutated) state of the RDD and have only a reference to the RDD-with-tranformation-applied. That doesn't make the RDD mutable nor does it make either map or mapPartitions a side-effecting mutator -- you've just changed where in a lineage of transformations you are pointing to with your mutable myRDD reference.


On Mon, Dec 9, 2013 at 11:06 AM, Yadid Ayzenberg <[email protected] <mailto:[email protected]>> wrote:


    Hi all,

    Im noticing some strange behavior when running mapPartitions.
    Pseudo code:

    JavaPairRDD<Object, Tuple2<Object, BSONObject>> myRDD =
    myRDD.mapPartitions( func )

    myRDD.count()

    ArrayList<Tuple2<Integer, Tuple2<List<Tuple2<Double, Double>>,
    List<Tuple2<Double, Double>>>>>tempRDD = myRDD.mapPartitions(func2 )

    tempRDD.count()


    JavaPairRDD<Object, Tuple2<Object, BSONObject>> myRDD =
    myRDD.mapPartitions( func )


    It seems that mapPartitions has side-effects. When I try running
    the last line - its seems that contents of myRDD have been changed
    by the previous map. I thought the RDD were immutable and that It
    was only possible to generate new RDDs using map. Is this incorrect?


    Thanks,
    Yadid



Reply via email to