JoanFM opened a new pull request #26523: [SPARK-29877][GRAPHX] static PageRank 
allow checkPoint from previous computations
URL: https://github.com/apache/spark/pull/26523
 
 
   ### What changes were proposed in this pull request?
   Add an optional parameter to the staticPageRank computation with the result 
of a previous PageRank computation. This would make the algorithm start from a 
different starting point closer to the convergence configuration
   
   
   ### Why are the changes needed?
   https://issues.apache.org/jira/browse/SPARK-29877
   
   It would be really helpful to have the possibility, when computing 
staticPageRank to use a previous computation as a checkpoint to continue the 
iterations.
   
   
   ### Does this PR introduce any user-facing change?
   Yes, it allows to start the static  page Rank computation from the point 
where an earlier one finished.
   
   Example: Compute 10 iteration first, and continue for 3 more iterations
   ```scala
   val partialPageRank = graph.ops.staticPageRank(numIter=10, resetProb=0.15)
   val continuationPageRank = graph.ops.staticPageRank(numIter=3,  
resetProb=0.15, Some(partialPageRank))
    ```
   
   
   ### How was this patch tested?
   Yes, some tests were added.
   Testing was done as follow:
   - Check how many iterations it takes for a static Page Rank computation to 
converge
   - Run the static Page Rank computation for half of these iterations and take 
result as checkpoint
   - Restart computation and check that number of iterations it takes to 
converge. It never has to be larger than the original one and in most of the 
cases it is much smaller.
   
   Due to the presence of sinks and the normalization done in [[SPARK-18847]] 
it is not exactly equivalent to compute static page rank for 2 iterations, take 
the result at checkpoint and run for 2 more iterations than to compute directly 
for 4 iterations.
   
   However this checkpointing can give the algorithm a hint about the true 
distribution of pageRanks in the graph
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to