> It seems like what you want here is for browsers to parse as they do now, but 
> a particular
> subset of browser-accepted syntax to be enshrined so that when defining your 
> restrictions
> over content you control you can just say "follow the spec" instead of 
> "follow the spec and
> don't put '>' in attribute values", right? 

That is not the idea. I'll try to explain deeper. The problem takes it source 
in XML:

XML attribute values allow any characters but "<", "&" and the string delimiter 
of the value which can be " or '.
- Why "<" is forbidden ? The response is: in order to make sure we met the 
beginning of an entity tag whenever we met this character. But we can notice 
that if the parsing is still possible if "<" is allowed in attribute values. 
For example: <entitiy1 att1="<ok>"> could be parsed without error. So if "<" is 
forbidden it's not in order to make the parsing possible, but in order to 
facilitate en secure the parsing process.
- "&" as a value is forbidden because it's the escaping character for special 
characters.
- The string delimiter is forbidden because it has to be escaped in order to 
ensure a correct parsing.

We can see here that forbidden characters in XML have been chosen in order to  
ensure a certain quality of parsing. Nevertheless, parsing an XML contents 
still oblige to parse all attributes when it met an entity. This could be avoid 
if ">" was forbidden in attribute values. Maybe browsers don't care about this 
because they want to parse all attributes all the way. But nowadays they are 
many other purposes which are not displaying and that are involved in parsing 
HTML content. But parsing could be faster and more secure for all purposes (I 
mean not only for browsers)  if ">" is forbidden and to be replaced with "&gt;".

This is more about XML, but what do we have with HTML ? Replacing ">" with 
"&gt;" is already a good practice in XML and HTML. Some HTML attributes already 
forbid it (it is allowed in CDATA attributes, forbidden in %Text attributes). 
Since XML 2 has been stopped, I think it is an occasion for HTML to make the 
good practice replaced by a new restriction, and in the same time lighten 
parsing processes which are not browser related.

Why changing the HTML spec instead of adding a restriction when we want ">" to 
be forbidden ? Because I think we should all want ">" to be forbidden. It is 
already quite deprecated to use it directly in HTML attribute values. We can 
always use "&gt;" instead of ">" as we already use "&lt;" instead of "<".

I understand that browser developers are not feeling  concerned by this because 
parsing is working well as is for them.
And I admit the problem I've explained more due to XML than HTML.

Skrol29

Reply via email to