If some has to adopt the plugin, it has to go with new crawling. Will there be a way, where we could apply these scoring mechanisms to existing already fetched, indexed and merged pages too. Can you please shed some light?
Thanks Andrzej Bialecki <[EMAIL PROTECTED]> wrote: Ken Krugler wrote: >> Eugen Kochuev wrote: >>> Hello Andrzej, >>> >>>> Please see the scoring API - you can write a plugin that manipulates >>>> page scores according to your own idea. >>> >>> Thanks a lot for your answer, but could you please shed some more >>> light onto scoring technique used in the Nutch? >>> As I can see from the source code Nutch uses something similar to the >>> pagerank algorithm propagating page scores through outlinks, but >>> only one >>> iteration is used (while pagerank requires several iterations to >>> converge). >> >> That's a bit complicated subject - I could either explain this in >> very general terms, or suggest that you read the paper that underlies >> the current Nutch implementation (with a twist). Please see the >> comment in OPICScoringFilter.java for the link to the paper. > > I've started writing up a description of the changes that I think need > to be made to Nutch to really implement the OPIC algorithm, as > described by by the "Adaptive On-Line Page Importance Computation" > paper (ACM 1-58113-680-3/03/0005). > > Should I just open a JIRA issue, and dump what might be a pretty long > write-up into it? Yes, please do - I'd love to implement this in that original form, even if it would go into another plugin ... -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
