Emmanuel wrote:
The fetcher.server.min.delay value is exactly to prevent too rapid
crawling if you set the number of threads per host > 1.
==> Thanks for the details. I have a better view. But i'm still wondering
why is it set to 0 by default. We should set it to at least 5s in the code
or even in the xml config file.
isn't it ?
The assumption is that crawling with threads per host > 1 means that
your target is a site that you control, and it can withstand any load.
;) That's why the default value is 0.0, which means "crawl as fast as
possible, I'm in a hurry".
I agree with you that this property needs to be documented. It is now -
see rev. 575360.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com