[EMAIL PROTECTED] wrote:
The following is the number of hits from MSN bot, from all MSN bot IP addresses, to my webserver (through ALL historical logs I still have around):
1227 65.54.188.69 58 65.54.188.70 42 65.54.188.64 18 65.54.188.68 4 65.54.188.67
If I look at all traffic to my website MSN bot is still on top
1227 65.54.188.69 127 192.58.204.226 59 65.54.188.70 42 65.54.188.64 29 64.244.30.79 24 66.196.91.227 19 65.87.170.103 19 129.33.49.251 18 65.54.188.68 17 66.26.93.162
I know it's from MSN because it leaves the following in my log: "msnbot/0.3 (+http://search.msn.com/msnbot.htm)"
I assume over at MSN they are trying to scrape the Internet to build up their own web search engine. I am curious if others are seeing this same activity.
The command I used for these queries was (as root in /var/log/httpd):
for msn bot cat access_log| grep msnbot | awk '{ print $1 }' | sort | uniq -c | sort -gr | head
and
for all hits cat access_log| awk '{ print $1 }' | sort | uniq -c | sort -gr | head
Greg
You can prevent this by adding a few lines to your apache config file.
<Directory /var/www/htdocs> SetEnvIfNoCase User-Agent "msnbot" bad_bot Deny from env=bad_bot </Directory>
-- TriLUG mailing list : http://www.trilug.org/mailman/listinfo/trilug TriLUG Organizational FAQ : http://trilug.org/faq/ TriLUG Member Services FAQ : http://members.trilug.org/services_faq/ TriLUG PGP Keyring : http://trilug.org/~chrish/trilug.asc
