Lucene has basic scoring algorithm based on tf, tdf
and field boost value.

And Nutch adopts page rank concept by using its'
unique link analysis via DistributedAnalysisTool
class.

Actually I don't think most people run this. I believe it starts to have performance issues when your page counts get large, which is one of the reasons for the mapred work being done by Doug/Mike in a branch.

Typically the extent of "link analysis" is the number of inbound links to a page, which is always being calculated whenever the WebDB is updated following a crawl.

But when I take a look at "score in detail" of Nutch's
search result, I didn't see a factor called "link
analysis" or something else like that.

Where can I see this factor or it is already combined
into the score value we saw in the score detail page.

See my previous post on how inbound link counts are used to boost a Lucene document (web page).

-- Ken
--
Ken Krugler
TransPac Software, Inc.
<http://www.transpac.com>
+1 530-470-9200


-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to