Hi Mark, I just put this up on the wiki. Hope it helps:
http://wiki.apache.org/nutch/OptimizingCrawls Dennis Mark Kerzner wrote:
Hi, guys, my goal is to do by crawls at 100 fetches per second, observing, of course, polite crawling. But, when URLs are all different domains, what theoretically would stop some software from downloading from 100 domains at once, achieving the desired speed? But, whatever I do, I can't make Nutch crawl at that speed. Even if it starts at a few dozen URLs/second, it slows down at the end (as discussed by many and by Krugler). Should I write something of my own, or are their fast crawlers? Thanks! Mark