GitHub user ankurdave opened a pull request:

    https://github.com/apache/spark/pull/2308

    [SPARK-3427] [GraphX] Avoid active vertex tracking in static PageRank

    GraphX's current implementation of static (fixed iteration count) PageRank 
is implemented using the Pregel API. This unnecessarily tracks active vertices, 
even though in static PageRank all vertices are always active. Active vertex 
tracking incurs the following costs:
    
    1. A shuffle per iteration to ship the active sets to the edge partitions.
    2. A hash table creation per iteration at each partition to index the 
active sets for lookup.
    3. A hash lookup per edge to check whether the source vertex is active.
    
    I reimplemented static PageRank using the GraphX API instead of the 
higher-level Pregel API. In benchmarks on a 16-node m2.4xlarge cluster, this 
provided a 23% speedup (from 514 s to 397 s, mean over 3 trials) for 10 
iterations of PageRank on a synthetic graph with 10M vertices and 1.27B edges.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ankurdave/spark SPARK-3427

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2308.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2308
    
----
commit 449996ae1ceba7536c0a24491f8ae00696df1cf1
Author: Ankur Dave <[email protected]>
Date:   2014-09-06T23:47:45Z

    Avoid unnecessary active vertex tracking in static PageRank

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to