Hi,

 I'm experiencing fetch problems for several days.

 My fetch logs end with :

 "fetcher.Fetcher - Aborting with 50 hung threads."

 I have in nutch-site.xml the property fetcher.timelimit.mins valued at 360 (= 
6 hours) but my fetches end in approximately 2 hours instead. (note that enough 
pages are generated)

 I noticed that although I have the property fetcher.server.delay valued at 
5.0, some sites have been fetched several times in the same seconds, maybe 
every site rejected my crawler after some time due to this behaviour ?

 I don't have any exceptions in the logs.

 The hints I found on the internet about this suggest that this is a problem of 
crawl delays and robots.txt. I crawl 33 sites, and only 4 have crawl delays, 
then I don't think it's the problem. In fact, this problem began in the same 
period when I modified my fetcher.max.crawl.delay from -1 to 100000. Indeed, I 
noticed that -1 doesn't work well and sites with crawl delays aren't fetched. 
With a big value, my high crawl delay sites are well fetched, but the aborting 
fetch problem appeared in approximately the same time. I just set the 
fetcher.max.crawl.delay to 1000 hoping it will change something, but in fact I 
don't see the real link between this and the aborting problem which affect all 
sites. I posted another mail about the fetcher.max.crawl.delay = -1 issue, 
maybe this bug affects also the fetch of sites without any crawl delay.

 I use Nutch 1.2.

 Thanks for helping.

Reply via email to