Hi, This is likely to be related to https://issues.apache.org/jira/browse/NUTCH-719 (see post from S Dennis for the solution). The totalsize counter was getting out of sync with the actual content of the fetch queues causing the Fetcher to wait idly before timeouting and aborting. Nutch 0.9 uses a different fetcher implementation which is why it did not give this problem.
What makes you think that it stops prematurely? Aren't you getting all the expected URLs? HTH Julien -- DigitalPebble Ltd http://www.digitalpebble.com 2009/10/2 Vijay <vijay.stanf...@gmail.com> > Hi all, > > I am trying to use nutch to crawl and index a list of about 50K URLs > with depth=1. I am running indexing with the command: > nutch-1.0/bin/nutch crawl urls/ -depth 1 -topN 100000 > with appropriate changes to the configuration files. > > I find that the fetching always terminates prematurely and the logs show > an error that looks like: > > > ---------------------------------------------------------------------------------------------------------------- > activeThreads=200, spinWaiting=200, fetchQueues.totalSize=1 > Aborting with 200 hung threads. > Fetcher: done > > ---------------------------------------------------------------------------------------------------------------- > > I have not seen this particular error message when using nutch-0.9. Is it > advisable to revert to using nutch-0.9? Or do we have some kind of patch to > fix this error? > > > > Thanks, > Vijay >