So most other crawlers use the hostname, not the ip. That's good to
know.

google and yahoo, Yes. The others I am not sure.

Perhaps a dynamic property would help. If the elapsed time of the previous request is some fraction of the delay then we might lessen the delay. Similarly, if it is greater or if we get 503s, then we might increase it. For example, if the fraction were .5 and the delay is 2 seconds, then sites which respond faster than a second would get their delay decreased, and sites which respond in more than a second or that return 503 would have their delay increased. Do you think this would be effective with your site?

Adjusting the amount of downloads dynamically according to the response
time should be great.

But where is the advantage doing this per unique name?

If there is no real reason to do so, I would do it dynamically per IP or
second level domain, but not per sub domain.

Matthias



-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download it for free - -and be entered to win a 42" plasma tv or your very
own Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to