Hi Mark, I've recently contributed 2 patches on JIRA (NUTCH-769 / NUTCH-770) which will have an impact on the speed of the crawling. This should help with the fetch rate slowing down. There is also https://issues.apache.org/jira/browse/NUTCH-753 which should help to a lesser extent.
Julien -- DigitalPebble Ltd http://www.digitalpebble.com 2009/11/24 Mark Kerzner <markkerz...@gmail.com> > Hi, guys, > > my goal is to do by crawls at 100 fetches per second, observing, of course, > polite crawling. But, when URLs are all different domains, what > theoretically would stop some software from downloading from 100 domains at > once, achieving the desired speed? > > But, whatever I do, I can't make Nutch crawl at that speed. Even if it > starts at a few dozen URLs/second, it slows down at the end (as discussed > by > many and by Krugler). > > Should I write something of my own, or are their fast crawlers? > > Thanks! > > Mark >