GitHub user moustaki opened a pull request:
https://github.com/apache/spark/pull/14998
[SPARK-11496][GRAPHX] Parallel implementation of personalized pagerank
(Updated version of [PR-9457](https://github.com/apache/spark/pull/9457),
rebased on latest Spark master, and using mllib-local).
This implements a parallel version of personalized pagerank, which runs all
propagations for a list of source vertices in parallel.
I ran a few benchmarks on the full [DBpedia](http://dbpedia.org/) graph.
When running personalized pagerank for only one source node, the existing
implementation is twice as fast as the parallel one (because of the
SparseVector overhead). However for 10 source nodes, the parallel
implementation is four times as fast. When increasing the number of source
nodes, this difference becomes even greater.

You can merge this pull request into a Git repository by running:
$ git pull https://github.com/moustaki/spark parallel-ppr
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/14998.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #14998
----
commit f41975efdbe1794fe456babb7f79d0290ecfc70d
Author: Yves Raimond <[email protected]>
Date: 2015-10-30T23:05:20Z
Parallel personalized pagerank implementation
commit 3605e40e8e61c853c89637918c82c7e0bcec69e8
Author: Yves Raimond <[email protected]>
Date: 2015-11-04T17:39:05Z
Scala style tweaks
commit 8b34e5ce0fd25423349c4ded45859ef3a02153cf
Author: Yves Raimond <[email protected]>
Date: 2015-11-24T01:26:43Z
Removing breeze dependency from mllib (available through graphx)
commit 508ba451eb636365388fc5bf77b531ce1d3d70d4
Author: Yves Raimond <[email protected]>
Date: 2015-11-24T01:29:02Z
Renaming SparseVector to BSV
commit 09d31c852e8213a2923de68b6e3f2b3ec6f71166
Author: Yves Raimond <[email protected]>
Date: 2015-11-24T01:30:39Z
Removing extra space, extra line
commit 85063535bd29936c260844f1fd815faccb474169
Author: Yves Raimond <[email protected]>
Date: 2015-11-24T01:39:41Z
Code-style changes
commit 2d1dee77f16bb2540bb46c7169dfd9c7e8228d77
Author: Yves Raimond <[email protected]>
Date: 2015-10-30T23:05:20Z
Parallel personalized pagerank implementation
commit 202acb2bb1f2451646e7aa69f1b0152cc5fb623e
Author: Yves Raimond <[email protected]>
Date: 2015-11-04T17:39:05Z
Scala style tweaks
commit 69db3855e5485847f3951f87c1f14ed3059a7f6f
Author: Yves Raimond <[email protected]>
Date: 2015-11-24T01:29:02Z
Renaming SparseVector to BSV
commit a42d272c21955d838ac50eeba53ca1b74bf2c4d2
Author: Yves Raimond <[email protected]>
Date: 2015-11-24T01:30:39Z
Removing extra space, extra line
commit 53ab670cd48375732535a95368743fc57fc4e942
Author: Yves Raimond <[email protected]>
Date: 2015-11-24T01:39:41Z
Code-style changes
commit 31e2e98d8ea1e374141b7e87101966513eceb992
Author: Yves Raimond <[email protected]>
Date: 2016-09-05T17:28:55Z
Moving to mllib-local
commit 13fbf55474d85a76961f72b9e529adaa103885c1
Author: Yves Raimond <[email protected]>
Date: 2016-09-05T17:33:22Z
Merge branch 'parallel-ppr' of github.com:moustaki/spark into parallel-ppr
commit c7ca220423054137480ce197014d898afc83c745
Author: Yves Raimond <[email protected]>
Date: 2016-09-05T17:38:04Z
Cleaning up pom dependencies
commit 1ec345f2168b4b4f54847aed0dccf9d8c1c06873
Author: Yves Raimond <[email protected]>
Date: 2016-09-05T21:53:15Z
Removing unused import
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]