Eugen Kochuev wrote:
Hello Andrzej,
Please see the scoring API - you can write a plugin that manipulates
page scores according to your own idea.
Thanks a lot for your answer, but could you please shed some more
light onto scoring technique used in the Nutch?
As I can see from the source code Nutch uses something similar to the
pagerank algorithm propagating page scores through outlinks, but only one
iteration is used (while pagerank requires several iterations to
converge).
That's a bit complicated subject - I could either explain this in
very general terms, or suggest that you read the paper that
underlies the current Nutch implementation (with a twist). Please
see the comment in OPICScoringFilter.java for the link to the paper.
I've started writing up a description of the changes that I think
need to be made to Nutch to really implement the OPIC algorithm, as
described by by the "Adaptive On-Line Page Importance Computation"
paper (ACM 1-58113-680-3/03/0005).
Should I just open a JIRA issue, and dump what might be a pretty long
write-up into it?
Thanks,
-- Ken
--
Ken Krugler
Krugle, Inc.
+1 530-210-6378
"Find Code, Find Answers"
-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general