I have a simple Page-rank algorithm for general purpose graphs implemented using Python/Hadoop streaming. It uses the simple power method. The Map-reduce algorithm is described in http://static.last.fm/johan/huguk-20090414/paolo_castagna-pagerank.pdf. One difference -- the transition probabilities along the edges are non-uniform in my implementation. For what's it worth, at the end of the ranking process, the code generates a visualization of the network graph with the page-ranks for the vertices. This file can be viewed using GUESS (http://graphexploration.cond.org/). (Obviously for webscale datasets, this visualization is worthless).
I was planning on porting my code to Mahout as a good way of learning more about Mahout. However, if Ken is going to contribute this code, and the code is going to be more scalable, then I can look at implementing something else -- perhaps TextRank, SimRank... Let me know, - Manish On Thu, Jul 1, 2010 at 9:24 AM, Ken Krugler <[email protected]>wrote: > > On Jul 1, 2010, at 8:16am, Andrzej Bialecki wrote: > > On 2010-06-30 21:11, Grant Ingersoll wrote: >> >>> >>> On Jun 27, 2010, at 12:10 PM, Manish Katyal wrote: >>> >>> Is there an implementation of the page-rank algorithm in Mahout? >>>> >>> >>> No, there isn't. However, do you mean to implement one specifically for >>> link analysis or a general purpose one? >>> >> >> There is one in Nutch, but it's tied to the Nutch API. >> > > It's likely we'll be contributing one to Mahout - either based on Jimmy > Lin's enhancements as described during Hadoop Summit on Tuesday, or we might > try the "do it all with SVD" approach as previously proposed by Ted, and > mentioned by Jake. > > -- Ken > > -------------------------------------------- > Ken Krugler > +1 530-210-6378 > http://bixolabs.com > e l a s t i c w e b m i n i n g > > > > >
