Human Resources         D�veloppement des ressources
Development Canada              humaines Canada
______________________________________________________

anti-keywords 
or 
anti-keywordareas   

http://www.w3.org/TR/html4/struct/global.html#edef-DIV   

Does anyone use the DIV tags in HTML to mark the "noindex, nofollow, follow, 
index" parts by way of block areas.     The emerging web content management 
systems may have done something along these lines for their own imbedded 
search/retrieval benefits, but this group should have a better idea on the 
subject.

So has anyone seen/done anything like....

<div id="robots-txt-noindex-follow" class="robots">
{headers/footer/siderbars}
</div>

<div id="robots-txt-noindex-nofollow" class="robots">
{a banner area }
</div>

<div id="robots-txt-index-nofollow" class="robots">
{ content for the index, but holds looping links or dynamically generated 
links which are best navigated via the statedataless sitemaps links. }
</div>

The following is assumed for all areas but can be explicitly stated, 
<div id="robots-txt-index-follow" class="robots">
{ content for the index }
</div>
and if done so on a single block then all other blocks not already defined as 
above are then treated as being "noindex, follow".

I would like to get comments and suggestions on the use of defined DIV id 
names to improve index processes.( global or local)

-Thomas Kay
Information Resource Management, Corporate Systems, Systems, National 
Headquarters, Human Resources Development Canada, Government of Canada.  
[EMAIL PROTECTED]
---------- Original Text ----------

From: "Andrew Daviel" <[EMAIL PROTECTED]>, on 21/11/2001 9:00 AM:


On Tue, 20 Nov 2001, Alan Perkins wrote:

> 
> > For example, Inktomi Enterprise Search uses <!--stopindex--> and
> > <!--startindex--> to turn indexing off and on within a page. Other
> > engines use different tags.
> 

htDig supports by default <!--htdig_noindex--> , <!--/htdig_noindex--> 
(configurable), plus (older?) non-DTD <noindex> and </noindex>

> It would be useful to have a "standard" for this over for all global search
> engines.  Something like <robot instruc="noindex,nofollow"> ... </robot> to
> allow finer grained manipulation than the meta robots tag allows.  NOINDEX
> and NOFOLLOW attributes for all tags that supported HREF attributes would
> also be handy...particularly for e-mail addresses.

Agreed. I also think the per-page anti-keyword list might be useful,
if a name or word occurs multiple times in a page. I don't share
 Nicholas Carroll's reservations about "stopword" and think that
<meta name="stopwords" content="key1, key2 .."> as the opposite
of "keywords" would not cause any confusion - it's implicit that
meta-tags are per-page elements. "nonwords" to me conjures up images of, 
well, non-words like "23.446" or "#%$!!@@@@!".

Regarding a <robot> HTML element, it would I think be naturally
ignored by existing agents and browsers yet parsable within a DTD.
Questions of precedence would need to be addressed. I believe that
if a page is listed in robots.txt that it is never even visited,
so robots.txt has precedence over <meta name=robots content=index>.
That in turn may prevent the body of the page being parsed,
otherwise I was wondering if it made sense to be able to say

<head><meta name=robots content=noindex></head><body>
don't index this page
<robot instruc="index">
except this bit
</robot>
</body>

otherwise the tag could be possibly simplified yet further to e.g.
<noindex>don't index this</noindex> (just have to get it in the DTD)
(Hmm, maybe we still want to distinguish "index" from follow" ...)

(I don't really care for the wordfragment "instruc". "action" maybe?)


Andrew Daviel, TRIUMF, Canada
also Vancouver Webpages


--
This message was sent by the Internet robots and spiders discussion list 
([EMAIL PROTECTED]).  For list server commands, send "help" in the body of 
a message to "[EMAIL PROTECTED]".


--
This message was sent by the Internet robots and spiders discussion list 
([EMAIL PROTECTED]).  For list server commands, send "help" in the body of a message 
to "[EMAIL PROTECTED]".

Reply via email to