On Tue, 20 Nov 2001, Alan Perkins wrote:
> > > For example, Inktomi Enterprise Search uses <!--stopindex--> and > > <!--startindex--> to turn indexing off and on within a page. Other > > engines use different tags. > htDig supports by default <!--htdig_noindex--> , <!--/htdig_noindex--> (configurable), plus (older?) non-DTD <noindex> and </noindex> > It would be useful to have a "standard" for this over for all global search > engines. Something like <robot instruc="noindex,nofollow"> ... </robot> to > allow finer grained manipulation than the meta robots tag allows. NOINDEX > and NOFOLLOW attributes for all tags that supported HREF attributes would > also be handy...particularly for e-mail addresses. Agreed. I also think the per-page anti-keyword list might be useful, if a name or word occurs multiple times in a page. I don't share Nicholas Carroll's reservations about "stopword" and think that <meta name="stopwords" content="key1, key2 .."> as the opposite of "keywords" would not cause any confusion - it's implicit that meta-tags are per-page elements. "nonwords" to me conjures up images of, well, non-words like "23.446" or "#%$!!@@@@!". Regarding a <robot> HTML element, it would I think be naturally ignored by existing agents and browsers yet parsable within a DTD. Questions of precedence would need to be addressed. I believe that if a page is listed in robots.txt that it is never even visited, so robots.txt has precedence over <meta name=robots content=index>. That in turn may prevent the body of the page being parsed, otherwise I was wondering if it made sense to be able to say <head><meta name=robots content=noindex></head><body> don't index this page <robot instruc="index"> except this bit </robot> </body> otherwise the tag could be possibly simplified yet further to e.g. <noindex>don't index this</noindex> (just have to get it in the DTD) (Hmm, maybe we still want to distinguish "index" from follow" ...) (I don't really care for the wordfragment "instruc". "action" maybe?) Andrew Daviel, TRIUMF, Canada also Vancouver Webpages -- This message was sent by the Internet robots and spiders discussion list ([EMAIL PROTECTED]). For list server commands, send "help" in the body of a message to "[EMAIL PROTECTED]".
