Another question is: what are the consequences of NOT running the "analyze" step? How does this affect the fetchlist generation, and the search scoring?
It works fine. This is what the "crawl" command does.
The indexer can optionally use log(number of incoming links) as a simplified link analysis score that does not require running "analyze". The "crawl" tool specifies this option (indexer.boost.by.link.count), and it works pretty well.
It would also be useful if fetchlist generation could similarly prioritize by number of incoming links. This would be easy to add. Simply change line 501 of FetchListTool.java to something like:
curScore.set(scoreByLinkCount
? (float)Math.log(results.length)
: page.getScore());where scoreByLinkCount is determined by a config file property.
If I get a chance I'll try to add this, or maybe you can first.
Doug
------------------------------------------------------- This SF.net email is sponsored by: IT Product Guide on ITManagersJournal Use IT products in your business? Tell us what you think of them. Give us Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more http://productguide.itmanagersjournal.com/guidepromo.tmpl _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers
