GitHub user ankurdave opened a pull request:

    https://github.com/apache/spark/pull/972

    [SPARK-2025] Unpersist edges of previous graph in Pregel

    Due to a bug introduced by apache/spark#497, Pregel does not unpersist 
replicated vertices from previous iterations. As a result, they stay cached 
until memory is full, wasting GC time.
    
    This PR corrects the problem by unpersisting both the edges and the 
replicated vertices of previous iterations. This is safe because the edges and 
replicated vertices of the current iteration are cached by the call to 
`g.cache()` and then materialized by the call to `messages.count()`. Therefore 
no unmaterialized RDDs depend on `prevG.edges`. I verified that no 
recomputation occurs by running PageRank with a custom patch to Spark that 
warns when a partition is recomputed.
    
    Thanks to Tim Weninger for reporting this bug.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ankurdave/spark SPARK-2025

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/972.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #972
    
----
commit 13d5b07eb48999a967935d6349be556f21f8db2c
Author: Ankur Dave <[email protected]>
Date:   2014-06-05T00:19:29Z

    Unpersist edges of previous graph in Pregel

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to