If you cached a RDD and hold a reference of that RDD in your code, then your RDD will NOT be cleaned up. There is a ReferenceQueue in ContextCleaner, which is used to keep tracking the reference of RDD, Broadcast, and Accumulator etc.
On Wed, May 22, 2019 at 1:07 AM Nasrulla Khan Haris <nasrulla.k...@microsoft.com.invalid> wrote: > Thanks for reply Wenchen, I am curious as what happens when RDD goes out > of scope when it is not cached. > > > > Nasrulla > > > > *From:* Wenchen Fan <cloud0...@gmail.com> > *Sent:* Tuesday, May 21, 2019 6:28 AM > *To:* Nasrulla Khan Haris <nasrulla.k...@microsoft.com.invalid> > *Cc:* dev@spark.apache.org > *Subject:* Re: RDD object Out of scope. > > > > RDD is kind of a pointer to the actual data. Unless it's cached, we don't > need to clean up the RDD. > > > > On Tue, May 21, 2019 at 1:48 PM Nasrulla Khan Haris < > nasrulla.k...@microsoft.com.invalid> wrote: > > HI Spark developers, > > > > Can someone point out the code where RDD objects go out of scope ?. I > found the contextcleaner > <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Fblob%2Fmaster%2Fcore%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fspark%2FContextCleaner.scala%23L178&data=02%7C01%7CNasrulla.Khan%40microsoft.com%7C81b54c9707834f297cc408d6ddf03381%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636940421061281654&sdata=ifd7sXnbwxIuzPXW2hIrhI%2BZN9kLccglY7W%2B%2BDJmbZI%3D&reserved=0> > code in which only persisted RDDs are cleaned up in regular intervals if > the RDD is registered to cleanup. I have not found where the destructor for > RDD object is invoked. I am trying to understand when RDD cleanup happens > when the RDD is not persisted. > > > > Thanks in advance, appreciate your help. > > Nasrulla > > > >