Hi - I hope this is the right list.

I have a suggestion for a new attribute to potentially make it into (x)html standard.

The attribute is for search engines, to instruct them not to index part of a page.

What I'm currently doing in xhtml is this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"; [
<!ATTLIST div spider (on | off) #IMPLIED>
]>

The added attribute is spider and takes a value of on or off.

I'm using it in a modification of the open source sphider search engine I'm working on. The idea is to avoid using html comments to turn on/off indexing on part of a page.

The actual attribute name and values of such an attribute is definitely open to discussion, but I think it should be non search crawler specific.

Example of use -

<p>This paragraph is indexed</p>
<p spider="off">This paragraph is not indexed</p>
<p>This paragraph is indexed</p>
<div spider="off">
  <p>This paragraph is not indexed</p>
  <p spider="on">This paragraph is indexed</p>
</div>
<img src="foo.jpg" alt="[This image is indexed]" />
<img src="bar.gif" spider="off" alt="[This image is not indexed]" />

Default is on unless the node or a parent node has turned it off.

It would be useful for things like navigation areas, images/multimedia you specifically do not want engines to index, signature areas of bulletin boards, etc.

Of course search engines would need their indexers to respect it, but that's why a standard attribute is very desirable. With a standard, many search engines would implement it as when properly used by the webmaster, it would improve the usefulness of the search engine.

Thoughts?

Reply via email to