Hi Julien, Thanks for the information and the quick response. I'll check out the related Jira's.
Using an external library like Apache Giraph to implement LinkRank sounds like a great idea. But I'm wondering whether such an implementation would require Nutch to provide the WebGraph (In or out link graph), making it necessary to port at least the WebGraph component to Nutch 2.1? Or is there another easy way to extract the Link graph (WebGraph) information from Nutch? Actually, I should be able to workaround my requirement, if I can find a way to extract the WebGraph information from the Nutch data store. thanks, Thilina On Mon, Oct 22, 2012 at 3:23 PM, Julien Nioche < [email protected]> wrote: > Hi Thilina > > As you've probably seen in the list archives or on JIRA this is indeed > something we'd like to have. As for the expected time range it is hard to > say as it depends on users contributions etc... Instead of porting the > existing one from 1.x I think we should delegate that to an external > resource like Apache Giraph which has an implementation AFAIK + would also > be more efficient > > In the meantime the default OPIC score should provide you with a ranking of > the pages and it can also be customized via scoring plugins > > HTH > > Julien > > On 22 October 2012 19:57, Thilina Gunarathne <[email protected]> wrote: > > > Dear all, > > I noticed that WebGraph and LinkRank are missing from Nutch 2.1. Is there > > any plans on porting them to Nutch 2.1 and if so, what would be the > > expected time range? Is it something that would be trivial or something > > that would involve a major rewrite.. Please pardon my lack of knowledge > on > > the Nutch internals. > > > > Also, in the meantime, is there any other algorithms or implementations > > that we can use for ranking the crawled web pages? How complex would it > be > > to write a pagerank implementation for the Nutch crawled data? > > > > thanks a lot in advance, > > Thilina > > > > -- > > https://www.cs.indiana.edu/~tgunarat/ > > http://www.linkedin.com/in/thilina > > http://thilina.gunarathne.org > > > > > > -- > * > *Open Source Solutions for Text Engineering > > http://digitalpebble.blogspot.com/ > http://www.digitalpebble.com > http://twitter.com/digitalpebble > -- https://www.cs.indiana.edu/~tgunarat/ http://www.linkedin.com/in/thilina http://thilina.gunarathne.org

