[GitHub] spark pull request #16271: [SPARK-18845][GraphX] PageRank has incorrect init...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16271 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16271: [SPARK-18845][GraphX] PageRank has incorrect init...
Github user aray commented on a diff in the pull request: https://github.com/apache/spark/pull/16271#discussion_r92621591 --- Diff: graphx/src/test/scala/org/apache/spark/graphx/lib/PageRankSuite.scala --- @@ -70,10 +70,10 @@ class PageRankSuite extends SparkFunSuite with LocalSparkContext { val resetProb = 0.15 val errorTol = 1.0e-5 - val staticRanks1 = starGraph.staticPageRank(numIter = 1, resetProb).vertices - val staticRanks2 = starGraph.staticPageRank(numIter = 2, resetProb).vertices.cache() + val staticRanks1 = starGraph.staticPageRank(numIter = 2, resetProb).vertices --- End diff -- Not really more robust since it has a sink and thus is still wrong pending SPARK-18847. But it is needed with the change to fully propagate the change in rank of source vertices in the first iteration as explained above. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16271: [SPARK-18845][GraphX] PageRank has incorrect init...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/16271#discussion_r92578668 --- Diff: graphx/src/main/scala/org/apache/spark/graphx/lib/PageRank.scala --- @@ -225,11 +225,11 @@ object PageRank extends Logging { ctx => ctx.sendToDst(ctx.srcAttr :* ctx.attr), (a : BV[Double], b : BV[Double]) => a :+ b, TripletFields.Src) - rankGraph = rankGraph.joinVertices(rankUpdates) { -(vid, oldRank, msgSum) => - val popActivations: BV[Double] = msgSum :* (1.0 - resetProb) + rankGraph = rankGraph.outerJoinVertices(rankUpdates) { --- End diff -- Is this a related but slightly separate fix? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16271: [SPARK-18845][GraphX] PageRank has incorrect init...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/16271#discussion_r92578753 --- Diff: graphx/src/test/scala/org/apache/spark/graphx/lib/PageRankSuite.scala --- @@ -70,10 +70,10 @@ class PageRankSuite extends SparkFunSuite with LocalSparkContext { val resetProb = 0.15 val errorTol = 1.0e-5 - val staticRanks1 = starGraph.staticPageRank(numIter = 1, resetProb).vertices - val staticRanks2 = starGraph.staticPageRank(numIter = 2, resetProb).vertices.cache() + val staticRanks1 = starGraph.staticPageRank(numIter = 2, resetProb).vertices --- End diff -- A few extra iterations makes the test more robust? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16271: [SPARK-18845][GraphX] PageRank has incorrect init...
GitHub user aray opened a pull request: https://github.com/apache/spark/pull/16271 [SPARK-18845][GraphX] PageRank has incorrect initialization value that leads to slow convergence ## What changes were proposed in this pull request? Change the initial value in all PageRank implementations to be `1.0` instead of `resetProb` (default `0.15`) and use `outerJoinVertices` instead of `joinVertices` so that source vertices get updated in each iteration. This seems to have been introduced a long time ago in https://github.com/apache/spark/commit/15a564598fe63003652b1e24527c432080b5976c#diff-b2bf3f97dcd2f19d61c921836159cda9L90 With the exception of graphs with sinks (which currently give incorrect results see SPARK-18847) this gives faster convergence as the sum of ranks is already correct (sum or ranks should be number of vertices). Convergence comparision benchmark for small graph: http://imgur.com/a/HkkZf Code for benchmark: https://gist.github.com/aray/a7de1f3801a810f8b1fa00c271a1fefd ## How was this patch tested? (corrected) existing unit tests and additional test that verifies against result of igraph and NetworkX on a loop with a source. You can merge this pull request into a Git repository by running: $ git pull https://github.com/aray/spark pagerank-initial-value Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16271.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16271 commit 9149ca233722e3d70aefb223dfd6a16ee8dbf924 Author: Andrew Ray Date: 2016-12-12T18:34:49Z fix commit b145376d88b6f5e58e2b9d051d9c268a36b9f939 Author: Andrew Ray Date: 2016-12-13T15:51:06Z fix initial value for grid graph independent calculation commit d39d2f07ab1a1aadb24dbd67bbbe37400beaadb4 Author: Andrew Ray Date: 2016-12-13T16:25:15Z use outer join so that sources are updated and fix reset probability for personalized commit 7ea03a88a3d9caa0ab7a7e6e681b8bf00b5cc128 Author: Andrew Ray Date: 2016-12-13T16:36:10Z fix star page rank test to account for sources getting updated in the first iteration which then changes the center in the second iteration --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org