Hi, My config is :
Nutch 1.0. generate.max.per.host = 130 fetcher.server.delay = 5 fetcher.threads.fetch = 50 number of hosts in seeds = 30 If the fetch was effective, we would get 130 * 6 (5+1 imprecision) seconds = 13 min for a fetch. According to the results, a fetch lasts 26 minutes. When I analyse hadoop.log, I noticed that some sites are fetched during the 13 first minutes, and the other sites, which weren't fetched until the 13rd minute, begin to be fetched after the 13rd minute. These sites are fetched until the 26th minute. I can conclude that the fetch lasts twice as much time than it should, because a part of the sites are fetched only after others. (some STATS are produced between the 2 steps) How can we prevent this split ? I mean, how to force all sites to be fetched from the beginning ? Thanks in advance for helping.

