If some has to adopt the plugin, it has to go with new crawling. Will there be 
a  way, where we could apply these scoring mechanisms to existing already 
fetched, indexed and merged pages too.
Can you please shed some light?

Thanks


Andrzej Bialecki <[EMAIL PROTECTED]> wrote: Ken Krugler wrote:
>> Eugen Kochuev wrote:
>>> Hello Andrzej,
>>>
>>>> Please see the scoring API - you can write a plugin that manipulates
>>>> page scores according to your own idea.
>>>
>>> Thanks a lot for your answer, but could you please shed some more
>>> light onto scoring technique used in the Nutch?
>>> As I can see from the source code Nutch uses something similar to the
>>> pagerank algorithm propagating page scores through outlinks, but 
>>> only one
>>> iteration is used (while pagerank requires several iterations to
>>> converge).
>>
>> That's a bit complicated subject - I could either explain this in 
>> very general terms, or suggest that you read the paper that underlies 
>> the current Nutch implementation (with a twist). Please see the 
>> comment in OPICScoringFilter.java for the link to the paper.
>
> I've started writing up a description of the changes that I think need 
> to be made to Nutch to really implement the OPIC algorithm, as 
> described by by the "Adaptive On-Line Page Importance Computation" 
> paper (ACM 1-58113-680-3/03/0005).
>
> Should I just open a JIRA issue, and dump what might be a pretty long 
> write-up into it?

Yes, please do - I'd love to implement this in that original form, even 
if it would go into another plugin ...

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com




 __________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

Reply via email to