[ https://issues.apache.org/jira/browse/SPARK-27718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16840945#comment-16840945 ]
De-En Lin edited comment on SPARK-27718 at 5/16/19 2:55 AM: ------------------------------------------------------------ In wiki, the equation of PageRank is as follows: !螢幕快照 2019-05-16 上午10.09.45.png! At the line 85 of example of pagerank.py in Spark, {code:java} ranks = contribs.reduceByKey(add).mapValues(lambda rank: rank * 0.85 + 0.15) {code} d is 0.85 and rank is summation over PR(pj)/L(pj) corresponding to the equation of PageRank. However, the term (1-d)/N in the code is 0.15 It forget to divide by N. Therefore, I change the code to {code:java} ranks = contribs.reduceByKey(add).mapValues( lambda rank: rank * 0.85 + (1 / num_vals) * 0.15) {code} The result will be correct and consistent with the result of NetworkX if the iteration is many times. was (Author: f422661): In wiki, the equation of PageRank is as follows: !螢幕快照 2019-05-16 上午10.09.45.png! At the line 85 of example of pagerank.py in Spark, {code:java} ranks = contribs.reduceByKey(add).mapValues(lambda rank: rank * 0.85 + 0.15) {code} d is 0.85 and rank is summation over PR(pj)/L(pj). However, the term (1-d)/N in the code is 0.15 It forget to divide by N. Therefore, I change the code to {code:java} ranks = contribs.reduceByKey(add).mapValues( lambda rank: rank * 0.85 + (1 / num_vals) * 0.15) {code} The result will be correct and consistent with the result of NetworkX if the iteration is many times. > incorrect result from pagerank > ------------------------------ > > Key: SPARK-27718 > URL: https://issues.apache.org/jira/browse/SPARK-27718 > Project: Spark > Issue Type: Bug > Components: Examples > Affects Versions: 2.4.1 > Reporter: De-En Lin > Priority: Minor > Attachments: 螢幕快照 2019-05-16 上午10.09.45.png > > > When I executed /examples/src/main/python/pagerank.py > The result is shown as follows > > {code:java} > 1 has rank: 0.5821576292853757. > 2 has rank: 0.3361551945789305. > 3 has rank: 0.3361551945789305. > 4 has rank: 0.3361551945789305. > {code} > > However, the same graph executed in networkx-pagerank. The result > shown as follows > {code:java} > {1: 0.4797305739863632, 2: 0.1734231420045456, 3: 0.1734231420045456, 4: > 0.1734231420045456} > {code} > > > > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org