Doug Cutting wrote:
Lukas Vlcek wrote:

However, this leads me to the question what exactly
fetcher.threads.per.host value is use for? More specifically what
*host* means in Nutch configuration world?


In this case, a host is an IP address.

I've thought about this more, and wonder if perhaps this should be switched so that host name are blocked from simultaneous fetching rather than IP addresses. I recently spoke with Carlos Castillo, author of the WIRE crawler (http://www.cwr.cl/projects/WIRE/) and it blocks hosts by name, not IP. What do others think?

Doug

Reply via email to