GitHub user moustaki opened a pull request:

    https://github.com/apache/spark/pull/9457

    [SPARK-11496][GRAPHX] Parallel implementation of personalized pagerank

    This implements a parallel version of personalized pagerank, which runs all 
propagations for a list of source vertices in parallel.
    
    I ran a few benchmarks on the full [DBpedia](http://dbpedia.org/) graph. 
When running personalized pagerank for only one source node, the existing 
implementation is twice as fast as the parallel one (because of the 
SparseVector overhead). However for 10 source nodes, the parallel 
implementation is four times as fast. When increasing the number of source 
nodes, this difference becomes even greater.  
    
    
![image](https://cloud.githubusercontent.com/assets/2491/10927702/dd82e4fa-8256-11e5-89a8-4799b407f502.png)


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/moustaki/spark parallel-ppr

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/9457.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #9457
    
----
commit f41975efdbe1794fe456babb7f79d0290ecfc70d
Author: Yves Raimond <[email protected]>
Date:   2015-10-30T23:05:20Z

    Parallel personalized pagerank implementation

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to