Andrew Ray created SPARK-18845:
----------------------------------

             Summary: PageRank has incorrect initialization value that leads to 
slow convergence
                 Key: SPARK-18845
                 URL: https://issues.apache.org/jira/browse/SPARK-18845
             Project: Spark
          Issue Type: Bug
          Components: GraphX
    Affects Versions: 2.0.2, 1.6.3, 1.5.2, 1.4.1, 1.3.1, 1.2.2
            Reporter: Andrew Ray


All variants of PageRank in GraphX have incorrect initialization value that 
leads to slow convergence. In the current implementations ranks are seeded with 
the reset probability when it should be 1. This appears to have been introduced 
a long time ago in 
https://github.com/apache/spark/commit/15a564598fe63003652b1e24527c432080b5976c#diff-b2bf3f97dcd2f19d61c921836159cda9L90

This also hides the fact that source vertices (vertices with no incoming edges) 
are not updated. This is because source vertices generally* have pagerank equal 
to the reset probability. Therefore both need to be fixed at once.

PR will be added shortly

*when there are no sinks -- but that's a separate bug



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to