"Klaus Johannes Rusch" wrote:
>> patterns within the uniqueness of an ID assignment, a pattern so recognizable <snip> >> Is checking for "*[no]index-[no]follow*" patterns actually a simple enough >> decision tree to actually work? >This approach resolves the issue that IDs have to be unique, however it does >not address how to identify, without a DTD, which attributes actually are IDs. I like the feedback, Klaus. Thanks. The ID attribute identifier term used in the original examples took inheritance from the generic attributes of HTML where the resource's DTD were as follows: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> See the Generic Attributes where ID is identified as "id". I believe that many robots check for this DTD upon resource load actions. Now if a DTD does not exist for that resource, then the unique 'id' term that identifies an ID is unknown, I agree. What is the robot to do with this resource? Very likely the robot will make a decision similar to any other non DTD'd resource. The robot normally makes decisions do to lack of specified DTD knowledge about the resource. In practice robots minimize when faced with a complexity the robot can't confirm (without a major clock cycle hit or without a near AI programming effort.) The DTD lets the robot know the document wide identification term that holds the unique id strings. If the DTD is missing it is OK to not have the robot do the extra workload of reverse engineering the resource to match a known DTD and thereby confirm the ID attribute terms. Where the robots time is very important as it is overloaded as it is, this simulates the practical switch in the decision tree to avoid less return for the work effort. In less complex or complete robots, the robot could take a leap of faith (programming short cut) and only work on "id" and still achieve a majority of the end goal. >A "cleaner" way of implementing something like this might be using an attribute >in a different namespace, like this: ><html xmlns:robots="http://www.robotstxt.org/2002/robots1"> >.... ><div robots:robots="noindex,follow"> > header nav area ></div> <snip> ></html> > >or for easier processing, use one attribute per axis: > ><html xmlns:robots="http://www.robotstxt.org/2002/robots2"> >... ><div robots:index="false" robots:follow="true"> > header nav area ></div> >... ></html> Humm... Exploring the possible "cleaner" namespace implementation: What is the effort to implement this quickly in a good percent of existing resource creation/handling tools? ( any feelings on the magnitude of work ) Is this new suggestion easier or harder overall than the "id" pattern method, in balancing out the resources handling efforts? The question helps find a path of least resistance in achieving the largest part of the needed result, sooner over later. Given a namespace method or "id" pattern method, which path can be taken for all involved tools given the current state of all tools? If "id" pattern method is selected, then could the "id" pattern be extracted later for use in a much more rigorous namespace implementation, say as tools grow in support of namespace methods. On detection of "id" pattern, the meaning is extractable and can be converted to namespace. So back we go to practicalities in achieving the largest part of the needed result in sooner over later terms. Could showing the "id" pattern method be more of a catalyst, a catalyst which triggers the reaction that allows element level indexing to begin. Where the final best form is not easily reached in a single step, then two steps can compensate for the temporal drag which resists the state change. >While syntactically correct, the semantics are still unclear, e.g. >* what does "nofollow" mean for text If a robot takes the time to determine if the text includes URL or URI information, then on detection that robot is highly recommend not to follow them. In use as external referencing, those URI may be best ignored by the robot effort. If used with "index" then any example based URI's in their non-hypertext text form is made retrievable. Example text such as "http://ibm.com exists but is not a reference link in this context." yes/no? >* how should a robot index and present something like > <span robots:robots="index">This is a <span robots:robots="noindex">top secret</span> word</span> > > As "This is a word"? Yes. For robots the spanned words of "top secret" is not a security of information concern, but a concern on the security of indexing quality. If that "top secret" span was placed in the index then that index's quality of use could be compromised. Any content security stays handled at the access level. -Thomas Kay [EMAIL PROTECTED] -- This message was sent by the Internet robots and spiders discussion list ([EMAIL PROTECTED]). For list server commands, send "help" in the body of a message to "[EMAIL PROTECTED]".
