Stas wrote:
How does Nutch ranks it's results?
Does it only uses Lucene internal relevance rules?
Or it also uses link and tag weights as the Google for example, or any other web search specific technology?

Nutch uses Lucene, but I wouldn't say it "only" uses Lucene. Lucene's scoring is extensible, and Nutch takes advantage of that.


Nutch indexes incoming anchor text in a field named "anchors". Incoming anchor text is searched along with page content, but with slightly different parameters.

Nutch can also perform link analysis, akin to Google, and link analysis scores are used as the Lucene boost for pages.

There's not yet a great document describing Nutch's scoring formula, and it's also a moving target that will probably vary between Nutch installations.

The authoritative reference is of course the source code. Some key classes involved in Nutch's use of Lucene are:

http://cvs.sourceforge.net/viewcvs.py/nutch/nutch/src/java/net/nutch/indexer/NutchSimilarity.java?view=markup
http://cvs.sourceforge.net/viewcvs.py/nutch/nutch/src/java/net/nutch/indexer/IndexSegment.java?view=markup
http://cvs.sourceforge.net/viewcvs.py/nutch/nutch/src/java/net/nutch/searcher/QueryTranslator.java?view=markup

Doug


------------------------------------------------------- This SF.net email is sponsored by: The Robotic Monkeys at ThinkGeek For a limited time only, get FREE Ground shipping on all orders of $35 or more. Hurry up and shop folks, this offer expires April 30th! http://www.thinkgeek.com/freeshipping/?cpg=12297 _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to