Alan Perkins wrote: > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > Hit rate > This directive could indicate to a robot how long to wait between > requests to the server. Currently it is accepted practice to wait at > least 30 seconds between requests, but this is too fast for some sites, > too slow for others. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > What's the current accepted practice for hit rate? Does it vary > according to
With the availability of persistent connections, a robot that drops the connection or keeps the connection open for 30 seconds without requesting another resource would not do the server any good. Large sites generally have good connectivity and robots can request resources at a higher rate without any performance degradation, regardless of response code. If a robot does find that a site is responding slowly (latency or throughput) it should reduce the hit rate or even suspend crawling temporarily to avoid overloading a server. > A) The HTTP response (e.g. no need to wait 30 seconds after a 304) I would recommend waiting after the server has closed the connection (not maintained the persistent connection), as long as the connection is open sending another request instead of waiting and keeping the connection open but inactive is the best choice. > > B) The number of robots you are running (e.g. 30 seconds per site per > robot, or 30 seconds per site across all your robots?) Running multiple robots in parallel increases the number of open connections required at the server, a single persistent connection is more server-friendly (and usually easier to manage too, I see some sites crawl the same resources in parallel robots, which apparantly do not communicate status information in real-time). > C) The number of active robots on the Web (e.g. 1000 robots isn't many, > 10 million robots is - and if too many unrelated robots hit a site, > that's another effective DDOS attack) The number of other robots hitting a site is not a known factor, although performance metrics can give an indiciation whether or not a site is under heavy load. > D) Some other factor (e.g. server response time, etc.) > E) None of the above (i.e. anything goes) DOS monitors may raise alerts or block traffic if robots hit a site too hard, too frequently, with too many parallel process etc. -- Klaus Johannes Rusch [EMAIL PROTECTED] http://www.atmedia.net/KlausRusch/ _______________________________________________ Robots mailing list [EMAIL PROTECTED] http://www.mccmedia.com/mailman/listinfo/robots