Doug Cutting wrote:
Lukas Vlcek wrote:
However, this leads me to the question what exactly
fetcher.threads.per.host value is use for? More specifically what
*host* means in Nutch configuration world?
In this case, a host is an IP address.
I've thought about this more, and wonder if perhaps this should be
switched so that host name are blocked from simultaneous fetching rather
than IP addresses. I recently spoke with Carlos Castillo, author of the
WIRE crawler (http://www.cwr.cl/projects/WIRE/) and it blocks hosts by
name, not IP. What do others think?
Doug