Emmanuel wrote:
The fetcher.server.min.delay value is exactly to prevent too rapid
crawling if you set the number of threads per host > 1.

==> Thanks for the details. I have a  better view. But i'm still wondering
why is it set to 0 by default. We should set it to at least 5s in the code
or even in the xml config file.
isn't it ?


The assumption is that crawling with threads per host > 1 means that your target is a site that you control, and it can withstand any load. ;) That's why the default value is 0.0, which means "crawl as fast as possible, I'm in a hurry".

I agree with you that this property needs to be documented. It is now - see rev. 575360.



--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to