Github user markhamstra commented on a diff in the pull request:

    https://github.com/apache/spark/pull/126#discussion_r10682180
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
    @@ -1025,6 +1025,14 @@ abstract class RDD[T: ClassTag](
         checkpointData.flatMap(_.getCheckpointFile)
       }
     
    +  def cleanup() {
    --- End diff --
    
    Ah, right, I'm following you on the original code snippet now.
    
    Yes, cleanup() isn't really worse than unpersist() wrt the second snippet.  
That tends to argue more for the removal of unpersist from the API than the 
inclusion of cleanup (and, yes, I realize that you are not arguing for the 
inclusion of cleanup in the public API.)  Ideally, the automatic garbage 
collection introduced with this PR is sufficient to handle all of the cleanup 
of RDD data and metadata, and thereafter a public unpersist() or cleanup() is 
no more needed than is an explicit means to destroy and garbage collect Java 
objects.  If we do want to maintain an explicit unpersist/cleanup mechanism, I 
think it should be behind a system administration kind of interface and 
protected by lots of "You're going to shoot your eye out!" warnings. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to