There is no circular dependency. Its simply dropping references to prev RDDs 
because there is no need for it.

I wonder if that messes up things up though internally for Spark due to losing 
references to intermediate RDDs.

> On Oct 8, 2014, at 12:13 PM, Akshat Aranya <[email protected]> wrote:
> 
> Using a var for RDDs in this way is not going to work.  In this example, 
> tx1.zip(tx2) would create and RDD that depends on tx2, but then soon after 
> that, you change what tx2 means, so you would end up having a circular 
> dependency.
> 
>> On Wed, Oct 8, 2014 at 12:01 PM, Sung Hwan Chung <[email protected]> 
>> wrote:
>> My job is not being fault-tolerant (e.g., when there's a fetch failure or 
>> something).
>> 
>> The lineage of RDDs are constantly updated every iteration. However, I think 
>> that when there's a failure, the lineage information is not being correctly 
>> reapplied.
>> 
>> It goes something like this:
>> 
>> val rawRDD = read(...)
>> val repartRDD = rawRDD.repartition(X)
>> 
>> val tx1 = repartRDD.map(...)
>> var tx2 = tx1.map(...)
>> 
>> while (...) {
>>   tx2 = tx1.zip(tx2).map(...)
>> }
>> 
>> 
>> Is there any way to monitor RDD's lineage, maybe even including? I want to 
>> make sure that there's no unexpected things happening.
> 

Reply via email to