GitHub user aray opened a pull request:

    https://github.com/apache/spark/pull/16483

    [SPARK-18847][GraphX] PageRank gives incorrect results for graphs with sinks

    ## What changes were proposed in this pull request?
    
    Graphs with sinks (vertices with no outgoing edges) don't have the expected 
rank sum of n (or 1 for personalized). We fix this by normalizing to the 
expected sum at the end of each implementation.
    
    Additionally this fixes the dynamic version of personal pagerank which gave 
incorrect answers that were not detected by existing unit tests.
    
    ## How was this patch tested?
    
    Revamped existing and additional unit tests with reference values (and 
reproduction code) from igraph and NetworkX.
    
    Note that for comparison on personal pagerank we use the arpack algorithm 
in igraph as prpack (the  current default) redistributes rank to all vertices 
uniformly instead of just to the personalization source. We could take the 
alternate convention (redistribute rank to all vertices uniformly) but that 
would involve more extensive changes to the algorithms (the dynamic version 
would no longer be able to use Pregel).


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/aray/spark pagerank-sink2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16483.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16483
    
----
commit 41178a3f7310eb30b2eebbac1fac532196ad3432
Author: Andrew Ray <[email protected]>
Date:   2017-01-06T06:24:21Z

    page rank sink fixes and unit tests

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to