--On March 26, 2006 2:16:13 PM -0500 Fred Atkinson <[EMAIL PROTECTED]> wrote: > > You have the wildcard at the top. You need to move 'User-agent: *, > Disallow: /simpy/' to the end of the file. It should be the very last > entry. > > What happens is that Googlebot gets to the * and accepts the > instructions there. It never gets to its own individual entry.
This is wrong. The spec [1] doesn't say anything about order being significant. One of the examples in the spec shows the robot "cybermapper" matching a user-agent line which is after a "*" entry. A robot which implements "first match" is not following the spec. The spec text in the proposed RFC [2] (never adopted) is more specific about this: These name tokens are used in User-agent lines in /robots.txt to identify to which specific robots the record applies. The robot must obey the first record in /robots.txt that contains a User- Agent line whose value contains the name token of the robot as a substring. The name comparisons are case-insensitive. If no such record exists, it should obey the first record with a User-agent line with a "*" value, if present. If you want to protect against this particular variety of robot mis-coding, go ahead. I don't know if there is any evidence of this behavior in the wild, so it might be a waste of time. [1] <http://www.robotstxt.org/wc/norobots.html> [2] <http://www.robotstxt.org/wc/norobots-rfc.html> wunder -- Walter Underwood Principal Software Architect, Autonomy _______________________________________________ Robots mailing list [email protected] http://www.mccmedia.com/mailman/listinfo/robots
