Michael Rosset wrote:
I'm getting what seems to be excessive http.max.delay errors after a
resent cvs update. Should I be looking at lowering the amount of threads
I am using, or is the normal to keep the fetcher performing at a higher
rate?

What percent of requests are generating this kind of error? Are they for servers that have more urls than you have time to fetch, given your fetcher.server.delay? If so, then you can lower your fetcher.server.delay, or accept that you cannot fetch all of these pages, or try raising http.max.delays.


URLs in a fetchlist are sorted by md5 hash, randomizing them by host. So a host that appears in a large proportion of urls should have its urls spread evenly through the fetchlist. Each thread grabs urls from the fetchlist and delays until the host has been free for fetcher.server.delay seconds. If it delays more than http.max.delays times, then it gives up on that URL. That's the algorithm today. RequestScheduler used to keep a queue of urls per host, and drop urls when the queues were too long, with a similar effect.

Doug


-------------------------------------------------------
This SF.Net email sponsored by Black Hat Briefings & Training.
Attend Black Hat Briefings & Training, Las Vegas July 24-29 - digital self defense, top technical experts, no vendor pitches, unmatched networking opportunities. Visit www.blackhat.com
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to