[whatwg] id and xml:id
Since UAs handle whitespace in the id attribute inconsistently (see below), old specs imply or require whitespace trimming and ids with whitespace are unreferencable from whitespace-separated lists of ids, I suggest adding the following language concerning document conformance: The value of the id attribute must be a string that consists of one or more characters matching the following production: [#x21-#xD7FF]| [#xE000-#xFFFD]|[#x1-#x10] (any XML 1.0 character excluding whitespace). Also, I suggest requiring that elements must not have both id and xml:id and requiring that xml:id must not occur in the HTML serialization. (Again, from the document conformance point of view-- not disputing requirements on browsers.) Rationale: HTML doesn't have namespace processing of colonified names and the xml:id spec is not designed for HTML. Allowing xml:id in HTML feels intuitively wrong (perhaps even a bit evil :-). If an element had both an id attribute and an xml:id attribute with different values, the document would not be HTML-serializable, which would be bad. (Obviously, even with only one kind of ID attribute on an element, in round tripping from XHTML to HTML to XHTML, the information about whether the original attribute was id or xml:id is lost just like the information about whether a table had a tbody is lost.) If an element was allowed to have an id attribute and an xml:id attribute with the same value, the following constraint from xml:id spec would be violated even for conforming docs: An xml:id processor should assure that the following constraint holds: * The values of all attributes of type “ID” (which includes all xml:id attributes) within a document are unique. ( http://www.w3.org/TR/xml-id/ ) Assuming, of course, that the XHTML5 id can still be considered an ID in the XML sense. Finally, as the ultimate ID nitpicking, the spec should state that it is naughty of authors to turn attributes other than id and xml:id into IDs via the DTD. (Well, using a DTD at all is naughty. :-) - - Test case: http://hsivonen.iki.fi/test/wa10/adhoc/id.html The script tries every id with a whitespaceless value to see if whitespace is trimmed before ID assignment. Firefox: id='a' PASS id='2' PASS id='lt;' PASS id=',' PASS id='auml;' PASS id=' c ' FAIL id='\nd\n' PASS id='\t\te\t\t' PASS id='#13;f#13;' PASS Opera (weekly build 3312; note that Opera recently changed its behavior to match the others with id=' c '): id='a' PASS id='2' PASS id='lt;' PASS id=',' PASS id='auml;' PASS id=' c ' FAIL id='\nd\n' PASS id='\t\te\t\t' PASS id='#13;f#13;' FAIL Safari and IE 6: id='a' PASS id='2' PASS id='lt;' PASS id=',' PASS id='auml;' PASS id=' c ' FAIL id='\nd\n' FAIL id='\t\te\t\t' FAIL id='#13;f#13;' FAIL -- Henri Sivonen [EMAIL PROTECTED] http://hsivonen.iki.fi/
Re: [whatwg] id and xml:id
Quoting Henri Sivonen [EMAIL PROTECTED]: Also, I suggest requiring that elements must not have both id and xml:id and requiring that xml:id must not occur in the HTML serialization. (Again, from the document conformance point of view-- not disputing requirements on browsers.) How could it occur in a HTML document? (Given that the browser in question is namespace aware.) I'm assuming here that we're not talking about adding stuff through the DOM given that you talk about serialization. Rationale: HTML doesn't have namespace processing of colonified names and the xml:id spec is not designed for HTML. Allowing xml:id in HTML feels intuitively wrong (perhaps even a bit evil :-). I agree. If an element had both an id attribute and an xml:id attribute with different values, the document would not be HTML-serializable, which would be bad. Now I agree that's bad, but I think there is something to say for elements having multiple IDs. (Even though that's not valid for some definition of it.) (Obviously, even with only one kind of ID attribute on an element, in round tripping from XHTML to HTML to XHTML, the information about whether the original attribute was id or xml:id is lost just like the information about whether a table had a tbody is lost.) Interesting point. I think tbody should be required in XHTML personally to go against that. It just doesn't make sense the way it is now. If an element was allowed to have an id attribute and an xml:id attribute with the same value, the following constraint from xml:id spec would be violated even for conforming docs: An xml:id processor should assure that the following constraint holds: * The values of all attributes of type “ID” (which includes all xml:id attributes) within a document are unique. ( http://www.w3.org/TR/xml-id/ ) Assuming, of course, that the XHTML5 id can still be considered an ID in the XML sense. It should be considered an ID in the XML sense for getElementByID and friends. Finally, as the ultimate ID nitpicking, the spec should state that it is naughty of authors to turn attributes other than id and xml:id into IDs via the DTD. (Well, using a DTD at all is naughty. :-) But through DOM methods is ok? (I agree that DTDs are obsolete...) Test case: http://hsivonen.iki.fi/test/wa10/adhoc/id.html Interesting testcase! Opera (weekly build 3312; note that Opera recently changed its behavior to match the others with id=' c '): Bah. I hope we can revert that... Do you have a similar test for xml:id? Opera does (did?) passes the following for example: http://annevankesteren.nl/test/xml/xml-id/008.xml Cheers, Anne -- Anne van Kesteren http://annevankesteren.nl/
Re: [whatwg] id and xml:id
Henri Sivonen wrote: Since UAs handle whitespace in the id attribute inconsistently (see below), old specs imply or require whitespace trimming and ids with whitespace are unreferencable from whitespace-separated lists of ids, I suggest adding the following language concerning document conformance: The value of the id attribute must be a string that consists of one or more characters matching the following production: [#x21-#xD7FF]| [#xE000-#xFFFD]|[#x1-#x10] (any XML 1.0 character excluding whitespace). I'd rather see the id attribute restricted to an NCName token insofar as possible. We can make an exception for Hixie's repetition templates, but otherwise I think it should be compatible with the XML ID syntax. So xsd:id { pattern: \S*; } The concept of idness is a useful one for many tools, and even if browsers don't care what characters there are, other tools do. We can't express IDness in a schema if we insist on ignoring its syntactic restrictions. ~fantasai
Re: [whatwg] id and xml:id
Quoting fantasai [EMAIL PROTECTED]: I'd rather see the id attribute restricted to an NCName token insofar as possible. We can make an exception for Hixie's repetition templates, but otherwise I think it should be compatible with the XML ID syntax. I agree. Note also that the repetition template also allows for characters that are compatible with XML ID. Of course, this is only for valid documents... All things should still be defined in a way that they take into account invalid, yet well-formed, documents as well. (And HTML documents...) -- Anne van Kesteren http://annevankesteren.nl/
Re: [whatwg] id and xml:id
On Apr 2, 2006, at 19:26, Anne van Kesteren wrote: I agree. Note also that the repetition template also allows for characters that are compatible with XML ID. Of course, this is only for valid documents... All things should still be defined in a way that they take into account invalid, yet well-formed, documents as well. (And HTML documents...) I am interested in conforming DTD-invalid well-formed XHTML documents and conforming HTML documents. I think that whatever is allowed as an id attribute value in conforming HTML documents should also be conforming as a value of an id attribute in XHTML (but not necessarily conforming as an xml:id value) in order to allow XHTML- serializability of conforming HTML docs. (I am not interested in DTD-valid documents. I consider DTDs harmful.) -- Henri Sivonen [EMAIL PROTECTED] http://hsivonen.iki.fi/
Re: [whatwg] id and xml:id
Henri Sivonen wrote: On Apr 2, 2006, at 18:56, fantasai wrote: I'd rather see the id attribute restricted to an NCName token insofar as possible. We can make an exception for Hixie's repetition templates, but otherwise I think it should be compatible with the XML ID syntax. Do you mean common attrs should have a co-occurrence constraint that changes the datatype of the id attribute if the repeat attribute is present? Yes. Or, at the very least, if the repetition module is loaded. I was planning on defining the datatype of the id attribute as xsd:string { pattern = \S+ } NCName with the exception that it allows [ and ] will be one huge regexp. (But doable, of course.) If that is what we want, the syntax should probably be (Letter | '_') (NCNameCharWithout02D1and00B7)* (('[' | #x02D1) ( NCNameCharWithout02D1and00B7)+ (']' | #x00B7)))? ( NCNameCharWithout02D1and00B7)* with the XML 1.0 definitions of Letter and NCNameChar. Cool, that would even catch mismatched brackets. :) The concept of idness is a useful one for many tools, and even if browsers don't care what characters there are, other tools do. We can't express IDness in a schema if we insist on ignoring its syntactic restrictions. I didn't bother to make that argument, because I thought changing the language to fit schemas wouldn't go down well with Hixie. :-) (In http://hsivonen.iki.fi/lists-in-attributes/ I tried to bring a general less code and more reuse of correct code argument into it instead of only playing the it's incompatible with my schema language of choice argument.) It's not my schema language of choice, it's the top three (by a long shot) schema languages in use for XML. I wasn't even expecting to be able to do IDREF integrity checks in RELAX NG. I was planning on doing it in Schematron or Java. Besides, general IDREF integrity checking does not check that, for example, the form attribute references only form elements and not just any ids. I would want that in the RelaxNG schema because there are editing tools that hook into RelaxNG, but not many (or any besides validators) that can hook into Schematron (Glazou, for example, is working on a RelaxNG-driven editor.) RelaxNG /can/ do IDREF integrity checks. The part about form attributes referencing only form elements can be checked by Schematron. From an authoring standpoint, the *most* useful part of IDREF integrity checking is to check against typos, not against misinterpretation of the idref attribute's intent. :) ~fantasai
[whatwg] Textareas
I've run into some issues with textareas and after checking http://www.w3.org/Submission/web-forms2/ and seeing that the mailing list is active... 1. I always thought that cols, which has been around forever, was advisory regarding width, in the sense that if there was no other overriding factor (CSS style settings), then cols would dictate the width of the textarea, and that would be the end of it. I further thought that .wrap=hard meant that the newlines that were sent to the server reflected exactly what the user saw / how the text was arranged in the textarea in the sense that there was exactly and only 1 newline (%0D%0A) exactly between each adjacent (possibly empty) pair of lines. Indeed, this is the behaviour that IE 6 exhibits on my Win XP Pro system. So when I saw that the linefeeds that FF was putting in were not reflecting what I was seeing on the screen I filed a bug report against it. Only afterwards did I come back to web-forms2 to review it, and was shocked by what I read: To paraphrase, it seems that cols is no longer simply advisory for (only) determining the width of the textarea. With .wrap=hard it says that the breaks should be dictated by the value of cols (and that if cols doesn't exist, the breaks should be dictated by the size of the textarea (I presume that's what display width means). It further goes on to say that this is anyways not a good thing because users have different size displays so everyone's wrapping position would be different, defeating the purpose of client side wrapping). Now, I am really wondering about that whole paragraph, my paraphrasing not withstanding. First of all, 'defeating the purpose of client side wrapping' begs the question of what is the purpose of .wrap? It's pretty clearly not something for the client side, right?, since there is no visible difference to the user while working with a textarea (there would be if .wrap=off, but that is not covered here). So if it's not useful for the client, it should be useful for the server. How? Ensuring that text is broken up every cols characters is a pretty trivial function (by this I mean that it is trivial to do server side) that does not serve much purpose in having it done on the client side. On the other hand, it is very useful information to know what the user was seeing so that what is processed on the server side has some correlation to what the user submitted. This is the dichotomy between how .wrap has been used so far - either delivering what the user saw (.wrap=hard) or what the user intends to be seen (.wrap=soft), making it useful for both sides. You can't deduce server side what the user saw client side, even assuming you know how wide the textarea is by any other means - the user might not even be using a fixed width font - .wrap=hard serves a very useful purpose here. And I don't see that the argument is any different whether or not cols has been set. In addition, there are millions of sites out there with .cols set because it must be (because cols has been mandatory for so long). These people will be in for a rude awakening indeed, to find out that cols now means something completely different. But I think I'd be even more annoyed as a user, it being as if I was chopped off in mid In short, I have outlined a compelling reason to have .wrap behave as it does on IE6 vs. a passing comment about the purpose of client side wrapping. In short, I am asking where this most peculiar mandate about .wrap=hard came from and expressing my strong disagreement with what I understand so far. 2. As long as I am writing, I may as well ask about another textarea issue that has always seemed strangely absent. Why is it that there is no way to find out what will actually be transmitted from a textarea. Seems to me that the client (javascript) might be at least as interested in this as the server cgi. It would be useful to know what row and column one was on in the textarea. Of course, it is possible to know what row one is on within the .value (because you can figure out where the caret is, and then count preceding \n's), but with wrapping, these two are different, and you don't really know for sure where things are being wrapped. Therefore, it would be exceptionally useful to have something like .observedValue to reflect what is being seen. In IE, I think I can figure this out via some range monkeying about (since they allow for rangeHeight (or something like that)), but I am stumped with Mozilla/FF. Csaba Gabor from Vienna __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com