Create a file called robots.txt in your web server's root directory and populate it with:

User-agent: msnbot
Disallow: /


Jeff G.

[EMAIL PROTECTED] wrote:

The following is the number of hits from MSN bot, from all MSN bot IP 
addresses, to my webserver (through ALL historical logs I still have around):

  1227 65.54.188.69
    58 65.54.188.70
    42 65.54.188.64
    18 65.54.188.68
     4 65.54.188.67


If I look at all traffic to my website MSN bot is still on top

  1227 65.54.188.69
   127 192.58.204.226
    59 65.54.188.70
    42 65.54.188.64
    29 64.244.30.79
    24 66.196.91.227
    19 65.87.170.103
    19 129.33.49.251
    18 65.54.188.68
    17 66.26.93.162


I know it's from MSN because it leaves the following in my log: "msnbot/0.3 (+http://search.msn.com/msnbot.htm)"

I assume over at MSN they are trying to scrape the Internet to build up their 
own web search engine.  I am curious if others are seeing this same activity.

The command I used for these queries was (as root in /var/log/httpd):

for msn bot
cat access_log| grep msnbot |  awk '{ print $1 }' | sort | uniq -c | sort -gr | 
head

and

for all hits
cat access_log| awk '{ print $1 }' | sort | uniq -c | sort -gr | head

Greg






-- TriLUG mailing list : http://www.trilug.org/mailman/listinfo/trilug TriLUG Organizational FAQ : http://trilug.org/faq/ TriLUG Member Services FAQ : http://members.trilug.org/services_faq/ TriLUG PGP Keyring : http://trilug.org/~chrish/trilug.asc

Reply via email to