I had a similar issue before with Nutch-1.2 and "10 hung threads".

It happened when I changed the code for HttpResponse.java. I tried
reconnecting/authenticating after having an http 500 error code. After
removing those specific codes everything well back to normal.

It's probably not the same problem for you but I hope it helps

Remi

On Fri, Feb 17, 2012 at 3:55 PM, Danicela nutch <[email protected]>wrote:

> Hi,
>
>  I'm experiencing fetch problems for several days.
>
>  My fetch logs end with :
>
>  "fetcher.Fetcher - Aborting with 50 hung threads."
>
>  I have in nutch-site.xml the property fetcher.timelimit.mins valued at
> 360 (= 6 hours) but my fetches end in approximately 2 hours instead. (note
> that enough pages are generated)
>
>  I noticed that although I have the property fetcher.server.delay valued
> at 5.0, some sites have been fetched several times in the same seconds,
> maybe every site rejected my crawler after some time due to this behaviour ?
>
>  I don't have any exceptions in the logs.
>
>  The hints I found on the internet about this suggest that this is a
> problem of crawl delays and robots.txt. I crawl 33 sites, and only 4 have
> crawl delays, then I don't think it's the problem. In fact, this problem
> began in the same period when I modified my fetcher.max.crawl.delay from -1
> to 100000. Indeed, I noticed that -1 doesn't work well and sites with crawl
> delays aren't fetched. With a big value, my high crawl delay sites are well
> fetched, but the aborting fetch problem appeared in approximately the same
> time. I just set the fetcher.max.crawl.delay to 1000 hoping it will change
> something, but in fact I don't see the real link between this and the
> aborting problem which affect all sites. I posted another mail about the
> fetcher.max.crawl.delay = -1 issue, maybe this bug affects also the fetch
> of sites without any crawl delay.
>
>  I use Nutch 1.2.
>
>  Thanks for helping.
>

Reply via email to