Re: Fetcher problems with stable version of nutch-1.0 ?

Julien Nioche Fri, 02 Oct 2009 01:21:22 -0700

Hi,

This is likely to be related to
https://issues.apache.org/jira/browse/NUTCH-719 (see post from S Dennis for
the solution). The totalsize counter was getting out of sync with the actual
content of the fetch queues causing the Fetcher to wait idly before
timeouting and aborting. Nutch 0.9 uses a different fetcher implementation
which is why it did not give this problem.


What makes you think that it stops prematurely? Aren't you getting all the
expected URLs?

HTH

Julien

-- 
DigitalPebble Ltd
http://www.digitalpebble.com

2009/10/2 Vijay <vijay.stanf...@gmail.com>

> Hi all,
>
>    I am trying to use nutch to crawl and index a list of about 50K URLs
> with depth=1.  I am running indexing with the command:
> nutch-1.0/bin/nutch crawl urls/ -depth 1 -topN 100000
>  with appropriate changes to the configuration files.
>
>  I find that the fetching always terminates prematurely and the logs show
> an error that looks like:
>
>
> ----------------------------------------------------------------------------------------------------------------
> activeThreads=200, spinWaiting=200, fetchQueues.totalSize=1
> Aborting with 200 hung threads.
> Fetcher: done
>
> ----------------------------------------------------------------------------------------------------------------
>
>   I have not seen this particular error message when using nutch-0.9. Is it
> advisable to revert to using nutch-0.9? Or do we have some kind of patch to
> fix this error?
>
>
>
> Thanks,
> Vijay
>

Re: Fetcher problems with stable version of nutch-1.0 ?

Reply via email to