Matt Zytaruk wrote:

Hi all.

I'm trying to do a full crawl (all the pages in the site) of about 100 sites. Unfortunately I'm getting as many errors as I am successful fetches, mostly all max.delays.exceeded. Is there any way to improve this so that I don't get this error as much? I tried changing the max.delays property in the nutch conf to a higher value, and I've also tried using fewer threads (went down from 100 to 50) but with no improvement really. This is using the nutch-0.8-dev version. Any help would be immensely appreciated.

I've seen something similar with 0.7.1. Unfortunately, it seems to be my ISP causing the trouble, I think in their DNS resolution. I tried the same crawl on another machine with a different ISP, and it went through much more smoothly. Like night and day.

--MDC

Reply via email to