[Robots] Re: Anti-thesaurus proposal

Alan Perkins 21 Nov 2001 16:55:51 -0000


On Wednesday, November 21, 2001, Andrew Daviel wrote:


> Questions of precedence would need to be addressed. I believe that
> if a page is listed in robots.txt that it is never even visited,
> so robots.txt has precedence over <meta name=robots content=index>.

Strictly speaking, robots.txt controls reading but NOT indexing, whereas
Meta Tags cannot prevent the reading of the document that contains them
(doh!) but CAN prevent that document being indexed once it has been read.

In other words: in theory, an indexing robot could create an index entry for
a document A when it encounters a reference to document A in another
document, B.  When the robot comes to read document A it can be prevented
from reading it by a robots.txt file on document A's server.  In theory,
this does NOT mean it needs to remove its index entry for document A (in
practise it usually does).  Continuing the theory, if document A contained a
noindex robots meta tag, the robot would be prevented from reading that tag
by the robots.txt file, so could still maintain its index entry and be
"within the rules".

> <head><meta name=robots content=noindex></head><body>
> don't index this page
> <robot instruc="index">
> except this bit
> </robot>
> </body>

Far simpler would be for a "no" in the head meta tag to be a "no" for the
whole document, whereas the lack of a "no" would indicate partial or total
acceptance.  E.g. "noindex" in the header means "Do not index this
document", whereas "index" (or nothing) in the header means "index some or
all of this document".

<dreaming>A robots HTTP request/response header field would also be very
nice </dreaming>

> otherwise the tag could be possibly simplified yet further to e.g.
> <noindex>don't index this</noindex> (just have to get it in the DTD)
> (Hmm, maybe we still want to distinguish "index" from follow" ...)

Yes, being able to control links being followed is very useful.  The idea of
one tag with attributes was to make upgrades smoother.

> (I don't really care for the wordfragment "instruc". "action" maybe?)

I don't care for "instruc" either...but I'm not too fussy.  As long as it is
future proof.

There are some other things that could be useful, too.  e.g. passing
instructions to named/not named robots or classes of robot.  Tags are better
for this.

Is anybody interested enough in moving this forward to do something about
it?  And does anybody know how to do something about it - officially, I
mean.  Surely, if it makes sense for site search, it makes sense for global
search.

Alan Perkins, Chief Technology Officer
e-Brand Management Limited




--
This message was sent by the Internet robots and spiders discussion list 
([EMAIL PROTECTED]).  For list server commands, send "help" in the body of a message 
to "[EMAIL PROTECTED]".

[Robots] Re: Anti-thesaurus proposal

Reply via email to