[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

tdas Mon, 17 Mar 2014 14:30:43 -0700

Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/126#discussion_r10679953
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
    @@ -1025,6 +1025,14 @@ abstract class RDD[T: ClassTag](
         checkpointData.flatMap(_.getCheckpointFile)
       }
     
    +  def cleanup() {
    --- End diff --
    
    Actually the current implementation wont. Calling rddA.cleanup() will only 
do two things
    1. Unpersist rddA
    2. Delete the shuffle dependencies and corresponding shuffle data only 
related to rddA. Lets assume rddA has two shuffle dependencies s1 and s2, one 
each to rdd1 and rdd2. These shuffle depencies are not shared with rddB. So 
cleaning rddA with the current implementation of RDD.clean() will not affect 
rddB. So the current implementation cannot be directly "misused". But it 
further reinforces Patrick's point earlier in this thread that it is not clear 
what is the desired semantics and its best to mark this function as a 
private[spark] for now.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

Reply via email to