Doug Cutting wrote:
Lukas Vlcek wrote:
However, this leads me to the question what exactly
fetcher.threads.per.host value is use for? More specifically what
*host* means in Nutch configuration world?
In this case, a host is an IP address.
I've thought about this more, and wonder if perhaps this should be
switched so that host name are blocked from simultaneous fetching rather
than IP addresses. I recently spoke with Carlos Castillo, author of the
WIRE crawler (http://www.cwr.cl/projects/WIRE/) and it blocks hosts by
name, not IP. What do others think?
Doug
-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server.
Download it for free - -and be entered to win a 42" plasma tv or your very
own Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general