-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I have noticed over the last few weeks that MSN's search bot (
http://search.msn.com/msnbot.htm ) is ignoring robots.txt entries.

My first indication was when I noticed that its hit count was much
higher than any other crawling/slurping bot.  I began scanning logs and
found specific instances where msnbot had directly requested items that
were specifically under robots.txt-blocked structures.

Upon running some Web searches I soon discovered that this is a global
problem.  This page -
http://algorhythm.org/archives/2003/06/27/bad_msnbot_bad.html - has a
good rundown of the issue, including M$' response to it (in summary,
they don't care).

I decided to block that bot as a whole from the servers I control.  Even
though the bot ignores most of what it finds in robots.txt it currently
honors statements that are specific to itself.  This makes it easy to
block from within robots.txt:

User-agent: msnbot
Disallow: /

Hope this helps,
Brian
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)

iD8DBQFDR/yr8iwHek1OcGYRAn/GAJ0YpUxMCp+Bw0rKLlTuBF8Su6rELQCgvfAW
Qmbo7XcD71/6N0hJJqx9UJk=
=aaOl
-----END PGP SIGNATURE-----

_______________________________________________
RLUG mailing list
[email protected]
http://lists.rlug.org/mailman/listinfo/rlug

Reply via email to