[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

tdas Mon, 17 Mar 2014 15:07:10 -0700

Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/126#discussion_r10681316
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
    @@ -1025,6 +1025,14 @@ abstract class RDD[T: ClassTag](
         checkpointData.flatMap(_.getCheckpointFile)
       }
     
    +  def cleanup() {
    --- End diff --
    
    If I understand the code in 
[CoGroupedRDD](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/CoGroupedRDD.scala#L84)
 correctly, every time a CoGrouped RDD is created (join uses cogroup 
underneath), a new dependency object is created. So even though rddA and rddB 
depend on the same rdd1, they should not be sharing the shuffle dependency. 
    
    Regarding the new code snippet, yes, that would result in problems for 
rdd2. But that can also be done currently with rdd1.unpersist().



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

Reply via email to