Do you want them to crawl, but just get throttled? Or no spider? If you really don't want to be crawled, you can alter the robots.txt file (http://www.robotstxt.org/) for your web server, of do a robots="nofollow" (http://www.robotstxt.org/meta.html). You could also use a sitemap spec (http://sitemaps.org/) to control how often you get crawled.

-- Cole



Quoting phpninja <[EMAIL PROTECTED]>:

One thing just off the top of my head would be a list similar to a
c:\windows\system32\drivers\etc\hosts file but for your website site. You
could write an array, or db table of malicious HTTP_USER_AGENT and if they
match just give them an IP/site ban. A quick google search gave me this
list: http://www.pgts.com.au/pgtsj/pgtsj0208d.html but I am sure there are
probably more.

Another thing could be the mod_throttle module (i think it only supports the
1.* version of apache though)
http://gunnm.org/~soda/work/oldstuff/vhffs2/vhffs-modthrottle/doc/ . You can
set policies with this, such as concurrent connections.

Regards,
-phpninja


On 2/22/08, Wade Preston Shearer <[EMAIL PROTECTED]> wrote:

Are there any effective methods for throttling bots and spiders from
spidering your site? If your sessions are database-based, how do you
keep them from bringing your server to it's knees?

_______________________________________________

UPHPU mailing list
[email protected]
http://uphpu.org/mailman/listinfo/uphpu
IRC: #uphpu on irc.freenode.net



_______________________________________________

UPHPU mailing list
[email protected]
http://uphpu.org/mailman/listinfo/uphpu
IRC: #uphpu on irc.freenode.net





_______________________________________________

UPHPU mailing list
[email protected]
http://uphpu.org/mailman/listinfo/uphpu
IRC: #uphpu on irc.freenode.net

Reply via email to