There is no circular dependency. Its simply dropping references to prev RDDs because there is no need for it.
I wonder if that messes up things up though internally for Spark due to losing references to intermediate RDDs. > On Oct 8, 2014, at 12:13 PM, Akshat Aranya <[email protected]> wrote: > > Using a var for RDDs in this way is not going to work. In this example, > tx1.zip(tx2) would create and RDD that depends on tx2, but then soon after > that, you change what tx2 means, so you would end up having a circular > dependency. > >> On Wed, Oct 8, 2014 at 12:01 PM, Sung Hwan Chung <[email protected]> >> wrote: >> My job is not being fault-tolerant (e.g., when there's a fetch failure or >> something). >> >> The lineage of RDDs are constantly updated every iteration. However, I think >> that when there's a failure, the lineage information is not being correctly >> reapplied. >> >> It goes something like this: >> >> val rawRDD = read(...) >> val repartRDD = rawRDD.repartition(X) >> >> val tx1 = repartRDD.map(...) >> var tx2 = tx1.map(...) >> >> while (...) { >> tx2 = tx1.zip(tx2).map(...) >> } >> >> >> Is there any way to monitor RDD's lineage, maybe even including? I want to >> make sure that there's no unexpected things happening. >
