Eugen Kochuev wrote:
Hello Andrzej,
Please see the scoring API - you can write a plugin that manipulates
page scores according to your own idea.
Thanks a lot for your answer, but could you please shed some more
light onto scoring technique used in the Nutch?
As I can see from the source code Nutch uses something similar to the
pagerank algorithm propagating page scores through outlinks, but only one
iteration is used (while pagerank requires several iterations to
converge).
That's a bit complicated subject - I could either explain this in very
general terms, or suggest that you read the paper that underlies the
current Nutch implementation (with a twist). Please see the comment in
OPICScoringFilter.java for the link to the paper.
Another questions are about db.score.injected and
db.score.link.internal parameters. They are listed in the
nutch-default.conf, but are never referenced in the code.
db.score.injected is used in the above-mentioned OPIC scoring plugin,
and in CrawlDbReducer. db.score.link.internal might be used in these
places, but isn't - please file a bug report, this needs to be fixed (if
we really want it to be fixed, i.e. if we really want to distinguish
between internal/external links when calculating score contributions and
setting initial scores).
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general