Benjamin Hawkes-Lewis ha scritto:
Calogero Alex Baldacchino wrote:
[...]


I think you're confusing parsing rules that conforming user agents must follow to associate identifiers with elements (even when ids are duplicated) with the authoring rules that conforming documents must follow (ids must be unique).

Ok, so what's what?

When you read "The value must not contain any space characters.", is it an authoring rule for conforming documents, for you? Ok.

When you read "*If the value is not the empty string, user agents must associate the element with the given value (exactly, including any space characters)* for the purposes of ID matching within the subtree the element finds itself (e.g. for selectors in CSS or for the |getElementById()| method in the DOM).", is it a parsing rule for conforming user agents, for you? Ok. But, isn't it worth to spend a word everywhere in the spec to tell when it's a quirck for backward compatibility, which might go away in the future, and when it's not, because that's not needed? And when it's a drawback from the past, shouldn't it be considered in every aspect? After all, wasn't one of the main goals of html 5 to turn unwritten and browser-specific rules into written and standard behaviours?

I mean, if you allow spacing characters inside an id value, as a parsing rule, you can face something like '<div id="foo bar" >', that is an id consisting of more than one token. Is it good to leave it in untouched? Yes? Ok, but what does it mean for CSS's, since there is a reference to them as one reason to allow space characters? That is, can a browser handle an id selector starting with the '#' character and being broken by a blank space? Or better, is it legal in CSS? Honestly, again, I don't remember well, I've never tried something like that (since makes no sense at me), and I think that's illegal. But let's say that's illegal for conforming style sheets, but existing user agents may or may not allow that, each one with its own behaviour. If we "close one eye" for '<div id="foo bar" >' in a piece of HTML 5 code, but leave its CSS counterpart to a free implementation, we'll solve half of the problem (where the problem is turning unwritten rules to written, and possibly improved, standards), won't we? But any kind of "CSS quirks" would be out of an HTML specification, and I believe '<div id="foo bar" >' is a trouble (if instead "foo bar" is not a valid id selector for CSS in any browser, that means we're allowing user agents to parse as valid an id which is inconsistent with CSS, and so CSS selectors cannot be a reason to allow space characters inside an id string - at least, with respect to any direct reference to the identifier value). But it might be a trouble per se, even only for html conformance by user agents, since an URL fragment might contain escaped space characters, but an escaped space isn't the same thing as the space character itself, so the rule of exact matching, applied to space characters inside an id, may be a trouble without extensively considering the '<div id="foo bar" >' case.

Now, let's say, instead, that a user agent, conforming with HTML 5 specifications, must cut off any token after the first one (I know actually "foo bar" is taken as is), that is <div id="foo bar"> becomes <div id="foo "> and <div id=" foo "> is valid too. In such a case, skipping any spaces too, and stating the same behaviour for strings passed to .getElementById() could be nice as a graceful degradation for documents non-conforming with the rule "the value [of an id attribute] must not contain any space characters", but such might fail with CSS selectors such as 'div[id="foo bar"]'.

Perhaps a compromise, if acceptable for backward compatibility, might be:
- when the id value must be compared to a fragment identifier, strip any trailing space characters; if the match fails, escape any other space characters both in the id value and in the fragid and try again; - when an attribute is defined to hold an url and its value has spaces in its path/query/fragment, escape them before resolving the url (not sure if needed); - for the purpose of ID matching through the DOM 'getElementById' method, leave the id value untouched; - for the purpose of ID matching through CSS selectors accessing it as an attribute, leave the id value untouched; - for the purpose of ID matching through CSS selectors directly accessing it (e.g. '#foo') either choose the first sequence of non-spacing characters or let the match fail (I can't decide what's better, but perhaps the former would fail as well, since I guess anyone coding <div id="foo bar"> not only as a fragment identifier, but also for styling, might have the nice idea to write "#foo bar { font-weight : bold; }" as well).

Anyway, if the id value is also a fragment identifier, which might have space characters (since parsing rules prescribe to add such characters to the unreserved production), does the (authoring) rule "the value must not contain any space characters" make sense?

Now let's come to the duplicated ids issue. Again, what's what? When it's said, "The id attribute represents its element's unique identifier. *The value must be unique in the subtree within which the element finds itself and must contain at least one character.*", I think that's what you call an authoring rule. So, I don't think it was so bad to ask for a clarification on the subtree nature. And if a subtree happened to match, eventually, an element subtree inside a document, was the suggestion for a getElementById method on the HTMLElement interface so awful? Otherwise, let's consider (again) the second paragraph:

"If the value is not the empty string, user agents must associate the element with the given value (exactly, including any space characters) *for the purposes of ID matching within the subtree the element finds itself (e.g. for selectors in CSS or for the |getElementById()| method in the DOM).*"

It's a parsing rule, isn't it? But it tells also the id must be unique in the whole document for the purpose of ID matching through the getElementById() method in the DOM, because the only object capable to get an element by its id is an instance of the Document interface. So, any choice should be taken on what to do with duplicated ids. Solving the question at the parser level (i.e. defaulting any duplicated id to the empty string) would be consistent with both the fragment identifier behaviour (only the first occurrence is valid) and the uniqueness rule, but might brake some semantics (i.e. an hyperlink used to create an instance of a <dfn>, or a <blockquote> with a cite attribute referencing a <cite> element, both with a duplicated id not being the first occurrence). On the other hand, leaving the duplicated id in the document requires some changes in the Document's getElementById() method, since the W3C DOM Core does not define a unique behaviour in such a case, and I've expressed a few dubts on solving this by adding an equivalent method on the HTMLDocument interface; anyway the getElementById() behaviour must be defined for such situations, and having it to pick the first match may be a solution (but might cause side/unwanted effects if misused in actual documents, and leaves no chance to access directly to any element with a duplicated id, but if I'm not careful when choosing an ID, I can complain just with myself... - anyway, the uniqueness fulfillment might become problematic when dinamically putting together pieces of code, perhaps from different sources, e.g. using XMLHTTPRequests, or because of externally syndicated contet, but this is in the scope of careful programming).

From the point of view of CSS, both choices may be consistent with coupled rules such as "#foo { font-size : 13; }" and #foo { font-size : 14; }", since both would refer to the same element because of cascading rules; on the other side, something like 'div[id="foo"] {/*something here*/}' or a direct reference to an ID selector as a descendant of different elements might perhaps isolate different elements in the document (whether to allow such or not is outside html scope - but are such cases in the wild?), and for the purpose of compatibility with document styled that way, leaving duplicated ids in the document would be a better choice. But, in such cases, shouldn't the DOM elements selection be consistent with the CSS elements selection (i.e. to avoid side-effects when CSS rules manipulate the DOM itself)? That is, if through CSS it were possible to reach elements with duplicated ids in different subtrees of a document tree (according to the definition of all nodes descendant of a non-leaf node as being part of its subtree) and to manipulate their content, shouldn't it be possible through the DOM too?

Anyway, I'm not so much confused, no more than usual :-P

BR, Alex.


--
Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP 
autenticato? GRATIS solo con Email.it http://www.email.it/f

Sponsor:
CheBanca! La prima banca che ti dà gli interessi in anticipo.
* Fino al 4,70% sul Conto Deposito, zero spese e interessi subito. Aprilo!
Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=7917&d=3-12

Reply via email to