Github user aray commented on the issue:
https://github.com/apache/spark/pull/16271
**References**
[Pagerank paper](http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf)
> We need to make an initial assignment of the ranks. This assignment can
be made by one of several strategies. If it is going to iterate until
convergence, in general the initial values will not affect final values, just
the rate of convergence. But we can speed
up convergence by choosing a good initial assignment.
Since they are more focused on updating values for one evolving graph (the
internet) they dont really talk about starting from scratch. But this does
emphisize that there is no change to answers, just rate of convergence.
A more direct statement would be
[Wikipedia](https://en.wikipedia.org/wiki/PageRank)
> PageRank is initialized to the same value for all pages. In the original
form of PageRank, the sum of PageRank over all pages was the total number of
pages on the web at that time, so each page in this example would have an
initial value of 1.
Note that there are two variants of pagerank that differ by a constant
multiple in outputs but are determined by the dampening factor, we use the
version that sums to N (most other implementations use the other). More
Wikipedia:
>The difference between them is that the PageRank values in the first
formula sum to one, while in the second formula each PageRank is multiplied by
N and the sum becomes N.
Essentialy starting with the correct sum is closer to the actual fixed
point and thus gets you faster convergence.
The [NetworkX
implementation](https://github.com/networkx/networkx/blob/master/networkx/algorithms/link_analysis/pagerank_alg.py#L122)
uses the variant that sums to 1 hence their initialization values are all 1/N.
igraph is unfortunately not comparable as they use a [more complex linear
solver
approach](https://github.com/igraph/igraph/blob/master/src/prpack/prpack_solver.cpp)
Additional credentials (if it matters): PhD Mathematics with dissertation
in Graph Theory
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]