Hi,

I've observed an interesting phenomenon that is not hard to reproduce and that 
I think should not be happening:

If you have N fetcher threads, inject, say, 2xN URLs of VERY large files plus a 
few smaller files to fetch and run something that uses 
org.apache.nutch.crawl.Crawl. The big files will take forever to download and 
the threads will be killed. The process then will proceed to the indexing 
stage. However, you will see fetcher threads output in the logs intermixed with 
the output of the indexer. This shows that they were not terminated properly 
(or at all?).

Regards,

Arkadi

Reply via email to