RE: RDD object Out of scope.

2019-05-21 Thread Nasrulla Khan Haris
Thanks Sean, that makes sense. 

Regards,
Nasrulla

-Original Message-
From: Sean Owen  
Sent: Tuesday, May 21, 2019 6:24 PM
To: Nasrulla Khan Haris 
Cc: dev@spark.apache.org
Subject: Re: RDD object Out of scope.

I'm not clear what you're asking. An RDD itself is just an object in the JVM. 
It will be garbage collected if there are no references. What else would there 
be to clean up in your case? ContextCleaner handles cleaned up of persisted 
RDDs, etc.

On Tue, May 21, 2019 at 7:39 PM Nasrulla Khan Haris 
 wrote:
>
> I am trying to find the code that cleans up uncached RDD.
>
>
>
> Thanks,
>
> Nasrulla
>
>
>
> From: Charoes 
> Sent: Tuesday, May 21, 2019 5:10 PM
> To: Nasrulla Khan Haris 
> Cc: Wenchen Fan ; dev@spark.apache.org
> Subject: Re: RDD object Out of scope.
>
>
>
> If you cached a RDD and hold a reference of that RDD in your code, then your 
> RDD will NOT be cleaned up.
>
> There is a ReferenceQueue in ContextCleaner, which is used to keep tracking 
> the reference of RDD, Broadcast, and Accumulator etc.
>
>
>
> On Wed, May 22, 2019 at 1:07 AM Nasrulla Khan Haris 
>  wrote:
>
> Thanks for reply Wenchen, I am curious as what happens when RDD goes out of 
> scope when it is not cached.
>
>
>
> Nasrulla
>
>
>
> From: Wenchen Fan 
> Sent: Tuesday, May 21, 2019 6:28 AM
> To: Nasrulla Khan Haris 
> Cc: dev@spark.apache.org
> Subject: Re: RDD object Out of scope.
>
>
>
> RDD is kind of a pointer to the actual data. Unless it's cached, we don't 
> need to clean up the RDD.
>
>
>
> On Tue, May 21, 2019 at 1:48 PM Nasrulla Khan Haris 
>  wrote:
>
> HI Spark developers,
>
>
>
> Can someone point out the code where RDD objects go out of scope ?. I found 
> the contextcleaner code in which only persisted RDDs are cleaned up in 
> regular intervals if the RDD is registered to cleanup. I have not found where 
> the destructor for RDD object is invoked. I am trying to understand when RDD 
> cleanup happens when the RDD is not persisted.
>
>
>
> Thanks in advance, appreciate your help.
>
> Nasrulla
>
>


Re: RDD object Out of scope.

2019-05-21 Thread Sean Owen
I'm not clear what you're asking. An RDD itself is just an object in
the JVM. It will be garbage collected if there are no references. What
else would there be to clean up in your case? ContextCleaner handles
cleaned up of persisted RDDs, etc.

On Tue, May 21, 2019 at 7:39 PM Nasrulla Khan Haris
 wrote:
>
> I am trying to find the code that cleans up uncached RDD.
>
>
>
> Thanks,
>
> Nasrulla
>
>
>
> From: Charoes 
> Sent: Tuesday, May 21, 2019 5:10 PM
> To: Nasrulla Khan Haris 
> Cc: Wenchen Fan ; dev@spark.apache.org
> Subject: Re: RDD object Out of scope.
>
>
>
> If you cached a RDD and hold a reference of that RDD in your code, then your 
> RDD will NOT be cleaned up.
>
> There is a ReferenceQueue in ContextCleaner, which is used to keep tracking 
> the reference of RDD, Broadcast, and Accumulator etc.
>
>
>
> On Wed, May 22, 2019 at 1:07 AM Nasrulla Khan Haris 
>  wrote:
>
> Thanks for reply Wenchen, I am curious as what happens when RDD goes out of 
> scope when it is not cached.
>
>
>
> Nasrulla
>
>
>
> From: Wenchen Fan 
> Sent: Tuesday, May 21, 2019 6:28 AM
> To: Nasrulla Khan Haris 
> Cc: dev@spark.apache.org
> Subject: Re: RDD object Out of scope.
>
>
>
> RDD is kind of a pointer to the actual data. Unless it's cached, we don't 
> need to clean up the RDD.
>
>
>
> On Tue, May 21, 2019 at 1:48 PM Nasrulla Khan Haris 
>  wrote:
>
> HI Spark developers,
>
>
>
> Can someone point out the code where RDD objects go out of scope ?. I found 
> the contextcleaner code in which only persisted RDDs are cleaned up in 
> regular intervals if the RDD is registered to cleanup. I have not found where 
> the destructor for RDD object is invoked. I am trying to understand when RDD 
> cleanup happens when the RDD is not persisted.
>
>
>
> Thanks in advance, appreciate your help.
>
> Nasrulla
>
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



RE: RDD object Out of scope.

2019-05-21 Thread Nasrulla Khan Haris
I am trying to find the code that cleans up uncached RDD.

Thanks,
Nasrulla

From: Charoes 
Sent: Tuesday, May 21, 2019 5:10 PM
To: Nasrulla Khan Haris 
Cc: Wenchen Fan ; dev@spark.apache.org
Subject: Re: RDD object Out of scope.

If you cached a RDD and hold a reference of that RDD in your code, then your 
RDD will NOT be cleaned up.
There is a ReferenceQueue in ContextCleaner, which is used to keep tracking the 
reference of RDD, Broadcast, and Accumulator etc.

On Wed, May 22, 2019 at 1:07 AM Nasrulla Khan Haris 
mailto:nasrulla.k...@microsoft.com.invalid>>
 wrote:
Thanks for reply Wenchen, I am curious as what happens when RDD goes out of 
scope when it is not cached.

Nasrulla

From: Wenchen Fan mailto:cloud0...@gmail.com>>
Sent: Tuesday, May 21, 2019 6:28 AM
To: Nasrulla Khan Haris 
mailto:nasrulla.k...@microsoft.com.invalid>>
Cc: dev@spark.apache.org<mailto:dev@spark.apache.org>
Subject: Re: RDD object Out of scope.

RDD is kind of a pointer to the actual data. Unless it's cached, we don't need 
to clean up the RDD.

On Tue, May 21, 2019 at 1:48 PM Nasrulla Khan Haris 
mailto:nasrulla.k...@microsoft.com.invalid>>
 wrote:
HI Spark developers,

Can someone point out the code where RDD objects go out of scope ?. I found the 
contextcleaner<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Fblob%2Fmaster%2Fcore%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fspark%2FContextCleaner.scala%23L178&data=02%7C01%7CNasrulla.Khan%40microsoft.com%7Cd3db7eb5d2464e56f8cf08d6de49ddb6%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636940806173476082&sdata=n%2FhFVJIRNVEgH%2FPM3oXfJ47VdhBtprAUGJh8tUPb3i8%3D&reserved=0>
 code in which only persisted RDDs are cleaned up in regular intervals if the 
RDD is registered to cleanup. I have not found where the destructor for RDD 
object is invoked. I am trying to understand when RDD cleanup happens when the 
RDD is not persisted.

Thanks in advance, appreciate your help.
Nasrulla



Re: RDD object Out of scope.

2019-05-21 Thread Charoes
If you cached a RDD and hold a reference of that RDD in your code, then
your RDD will NOT be cleaned up.
There is a ReferenceQueue in ContextCleaner, which is used to keep tracking
the reference of RDD, Broadcast, and Accumulator etc.

On Wed, May 22, 2019 at 1:07 AM Nasrulla Khan Haris
 wrote:

> Thanks for reply Wenchen, I am curious as what happens when RDD goes out
> of scope when it is not cached.
>
>
>
> Nasrulla
>
>
>
> *From:* Wenchen Fan 
> *Sent:* Tuesday, May 21, 2019 6:28 AM
> *To:* Nasrulla Khan Haris 
> *Cc:* dev@spark.apache.org
> *Subject:* Re: RDD object Out of scope.
>
>
>
> RDD is kind of a pointer to the actual data. Unless it's cached, we don't
> need to clean up the RDD.
>
>
>
> On Tue, May 21, 2019 at 1:48 PM Nasrulla Khan Haris <
> nasrulla.k...@microsoft.com.invalid> wrote:
>
> HI Spark developers,
>
>
>
> Can someone point out the code where RDD objects go out of scope ?. I
> found the contextcleaner
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Fblob%2Fmaster%2Fcore%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fspark%2FContextCleaner.scala%23L178&data=02%7C01%7CNasrulla.Khan%40microsoft.com%7C81b54c9707834f297cc408d6ddf03381%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636940421061281654&sdata=ifd7sXnbwxIuzPXW2hIrhI%2BZN9kLccglY7W%2B%2BDJmbZI%3D&reserved=0>
> code in which only persisted RDDs are cleaned up in regular intervals if
> the RDD is registered to cleanup. I have not found where the destructor for
> RDD object is invoked. I am trying to understand when RDD cleanup happens
> when the RDD is not persisted.
>
>
>
> Thanks in advance, appreciate your help.
>
> Nasrulla
>
>
>
>


RE: RDD object Out of scope.

2019-05-21 Thread Nasrulla Khan Haris
Thanks for reply Wenchen, I am curious as what happens when RDD goes out of 
scope when it is not cached.

Nasrulla

From: Wenchen Fan 
Sent: Tuesday, May 21, 2019 6:28 AM
To: Nasrulla Khan Haris 
Cc: dev@spark.apache.org
Subject: Re: RDD object Out of scope.

RDD is kind of a pointer to the actual data. Unless it's cached, we don't need 
to clean up the RDD.

On Tue, May 21, 2019 at 1:48 PM Nasrulla Khan Haris 
mailto:nasrulla.k...@microsoft.com.invalid>>
 wrote:
HI Spark developers,

Can someone point out the code where RDD objects go out of scope ?. I found the 
contextcleaner<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Fblob%2Fmaster%2Fcore%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fspark%2FContextCleaner.scala%23L178&data=02%7C01%7CNasrulla.Khan%40microsoft.com%7C81b54c9707834f297cc408d6ddf03381%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636940421061281654&sdata=ifd7sXnbwxIuzPXW2hIrhI%2BZN9kLccglY7W%2B%2BDJmbZI%3D&reserved=0>
 code in which only persisted RDDs are cleaned up in regular intervals if the 
RDD is registered to cleanup. I have not found where the destructor for RDD 
object is invoked. I am trying to understand when RDD cleanup happens when the 
RDD is not persisted.

Thanks in advance, appreciate your help.
Nasrulla



Re: RDD object Out of scope.

2019-05-21 Thread Wenchen Fan
RDD is kind of a pointer to the actual data. Unless it's cached, we don't
need to clean up the RDD.

On Tue, May 21, 2019 at 1:48 PM Nasrulla Khan Haris
 wrote:

> HI Spark developers,
>
>
>
> Can someone point out the code where RDD objects go out of scope ?. I
> found the contextcleaner
> 
> code in which only persisted RDDs are cleaned up in regular intervals if
> the RDD is registered to cleanup. I have not found where the destructor for
> RDD object is invoked. I am trying to understand when RDD cleanup happens
> when the RDD is not persisted.
>
>
>
> Thanks in advance, appreciate your help.
>
> Nasrulla
>
>
>