So most other crawlers use the hostname, not the ip. That's good to
know.
google and yahoo, Yes. The others I am not sure.
Perhaps a dynamic property would help. If the elapsed time of the
previous request is some fraction of the delay then we might lessen the
delay. Similarly, if it is greater or if we get 503s, then we might
increase it. For example, if the fraction were .5 and the delay is 2
seconds, then sites which respond faster than a second would get their
delay decreased, and sites which respond in more than a second or that
return 503 would have their delay increased. Do you think this would be
effective with your site?
Adjusting the amount of downloads dynamically according to the response
time should be great.
But where is the advantage doing this per unique name?
If there is no real reason to do so, I would do it dynamically per IP or
second level domain, but not per sub domain.
Matthias
-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server.
Download it for free - -and be entered to win a 42" plasma tv or your very
own Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general