I had a similar issue before with Nutch-1.2 and "10 hung threads".
It happened when I changed the code for HttpResponse.java. I tried reconnecting/authenticating after having an http 500 error code. After removing those specific codes everything well back to normal. It's probably not the same problem for you but I hope it helps Remi On Fri, Feb 17, 2012 at 3:55 PM, Danicela nutch <[email protected]>wrote: > Hi, > > I'm experiencing fetch problems for several days. > > My fetch logs end with : > > "fetcher.Fetcher - Aborting with 50 hung threads." > > I have in nutch-site.xml the property fetcher.timelimit.mins valued at > 360 (= 6 hours) but my fetches end in approximately 2 hours instead. (note > that enough pages are generated) > > I noticed that although I have the property fetcher.server.delay valued > at 5.0, some sites have been fetched several times in the same seconds, > maybe every site rejected my crawler after some time due to this behaviour ? > > I don't have any exceptions in the logs. > > The hints I found on the internet about this suggest that this is a > problem of crawl delays and robots.txt. I crawl 33 sites, and only 4 have > crawl delays, then I don't think it's the problem. In fact, this problem > began in the same period when I modified my fetcher.max.crawl.delay from -1 > to 100000. Indeed, I noticed that -1 doesn't work well and sites with crawl > delays aren't fetched. With a big value, my high crawl delay sites are well > fetched, but the aborting fetch problem appeared in approximately the same > time. I just set the fetcher.max.crawl.delay to 1000 hoping it will change > something, but in fact I don't see the real link between this and the > aborting problem which affect all sites. I posted another mail about the > fetcher.max.crawl.delay = -1 issue, maybe this bug affects also the fetch > of sites without any crawl delay. > > I use Nutch 1.2. > > Thanks for helping. >

