--On March 26, 2006 2:16:13 PM -0500 Fred Atkinson <[EMAIL PROTECTED]> wrote:
>
>     You have the wildcard at the top.  You need to move 'User-agent: *,
> Disallow: /simpy/' to the end of the file.  It should be the very last
> entry.
> 
>     What happens is that Googlebot gets to the * and accepts the
> instructions there.  It never gets to its own individual entry.

This is wrong.

The spec [1] doesn't say anything about order being significant. One of the
examples in the spec shows the robot "cybermapper" matching a user-agent
line which is after a "*" entry.

A robot which implements "first match" is not following the spec.

The spec text in the proposed RFC [2] (never adopted) is more specific
about this:

   These name tokens are used in User-agent lines in /robots.txt to
   identify to which specific robots the record applies. The robot
   must obey the first record in /robots.txt that contains a User-
   Agent line whose value contains the name token of the robot as a 
   substring. The name comparisons are case-insensitive. If no such
   record exists, it should obey the first record with a User-agent
   line with a "*" value, if present.

If you want to protect against this particular variety of robot
mis-coding, go ahead. I don't know if there is any evidence of this
behavior in the wild, so it might be a waste of time.

[1] <http://www.robotstxt.org/wc/norobots.html>
[2] <http://www.robotstxt.org/wc/norobots-rfc.html>

wunder
--
Walter Underwood
Principal Software Architect, Autonomy
_______________________________________________
Robots mailing list
[email protected]
http://www.mccmedia.com/mailman/listinfo/robots

Reply via email to