Are you just crawling a single site? Just one? What is
fetcher.threads.per.host? It is one by default, but only if
fetcher.threads.per.host is greater than one will the fetcher be able to
effectively use multiple threads to crawl a single site. Otherwise
these threads will conflict and fail to fetch pages.
Doug
Jakob Heidebrecht wrote:
Hallo,
Is there a problem of fetching with many threads?
I injected a single URL to the DB and fetched in each case three circles.
First case 1 fetcher thread, second and third 20 fetcher threads.
In the first case I got 102 pages,
in the sekond 19 pages and
in the third 22 pages.
Everything else was the same all the time.
Is this a bug?
May the server kick me out whet I'm fetching it with many threads at the
same time?
Regards,
Jakob
-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers