Doug Cutting wrote:
Lukas Vlcek wrote:

However, this leads me to the question what exactly
fetcher.threads.per.host value is use for? More specifically what
*host* means in Nutch configuration world?


In this case, a host is an IP address.

I've thought about this more, and wonder if perhaps this should be switched so that host name are blocked from simultaneous fetching rather than IP addresses. I recently spoke with Carlos Castillo, author of the WIRE crawler (http://www.cwr.cl/projects/WIRE/) and it blocks hosts by name, not IP. What do others think?

Doug


-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download it for free - -and be entered to win a 42" plasma tv or your very
own Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to