Re: [whatwg] More comments and questions on Web Apps 1.0
On Fri, 1 Jun 2007, Henri Sivonen wrote: I have no idea which section that was, nor which RFC that is (the URI is now dead). Is there an updated link? The section is now 3.17.1.1. Script languages. (The section numbering in the email you quoted is from the 2006-02-24 revision of the spec.) The linked draft has become http://www.ietf.org/rfc/rfc4329 Ah, indeed, that would be a good place to reference that. Noted. 2.20.1. When I read this, I had trouble organizing (in my mind) what I was reading because I had no prior understanding of where the spec was going. Up to this point, I had had prior hypotheses that were confirmed or disconfirmed by the spec. This section would be a lot easier to read if it had an introductionary paragraph stating the relationship of rendering, the DOM, the data model object and data submission. (Is the DOM being rendered or is a replaced widget element being rendered? Is it stylable? Is the data model reflected back to the DOM? What's the expected way of serializing the data model and sending it back to the server?) I don't know which section this is talking about. It was about datagrid. Is it better now? I think the non-normative intro section still doesn't sufficiently cover the relationship to the DOM and the CSS frame tree. The relationship to CSS will all be in the rendering section. I guess I don't really know what you think is needed in the intro section, I'm probably too close to it. Could you write some questions that you think an intro section should answer? It wasn't clear to me why the spec specified datagrid as part of required UA functionality instead of e.g. Google shipping an Open Source JavaScript library that implements the whole thing using existing stuff available in browsers. Is this about particular native widgets? About performance? Both of those, but also simply semantics (spelt accessibility for political correctness reasons). I thought there might be a requirement that the content made sense as a data model. I think that would be excessive. It might be a good idea, though. Do you think it should be further restricted? Not necessary, I guess. Ok. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Return values of on* event handlers
On Thu, 31 May 2007, Bill Mason wrote: Could you elaborate? The current release of iCab (3.03) treats 'return 0' the same as 'return false'. On the other hand, all these browsers do not in my testing: IE 3, 4, 5.0, 5.5, 6, 7 (Windows) IE 5.2 (Mac) Netscape 4, 8 (Windows) Netscape 6, 7 (Mac) Mozilla 1.7, Firefox 1.5, Firefox 2 (Mac) Opera 3, 4, 5 (Windows) Opera 6, 7, 8, 9 (Mac) Safari 2.0.4 (Mac) That seems pretty strongly a vote against it for me. Thanks for the testing! I guess it's a bug in iCab. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] HTMLDocument.title and SVGDocument
On Sat, 10 Feb 2007, Anne van Kesteren wrote: If HTMLDocument really is going to apply to every Document object... then at least HTMLDocument.title needs to somehow not clash with SVGDocument.title or do both or something. I basically see two options: HTMLDocument.title always wins, and you can get the other one using getFeature(), or, they both get redefined to check the root element and dispatch to the other one if appropriate. Suggestions? -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] live image maps
On Sun, 11 Feb 2007, Alexey Feldgendler wrote: On Sat, 10 Feb 2007 12:29:46 +0100, Anne van Kesteren [EMAIL PROTECTED] wrote: I think the specification should be clearer about what happens when an area element is added or removed. When an img element is added with a usemap=. When the shape= attribute is altered, et cetera. Either by handling each case specifically or stating in general that the specified algorithms always apply or something. The spec says: Image maps are live; if the DOM is mutated, then the user agent must act as if it had rerun the algorithms for image maps. Isn't it implied that any modification to the DOM tree should be equivalent to replacing the document with another one which would parse into the resulting DOM? If not, maybe it's worth specifying that this holds true unless explicitly stated otherwise. I don't think this is necessarily true. Replacing a document with another one has all sorts of implications that are quite complex. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] DOMTokenList versus DOMStringList
On Sun, 18 Feb 2007, Anne van Kesteren wrote: FWIW: DOM Level 3 Core seems to define a http://www.w3.org/TR/DOM-Level-3-Core/core.html#DOMStringList DOMStringList, but it seems far less useful than the proposed DOMTokenList. On the other hand, I suppose you could let DOMTokenList inherit from DOMStringList or something... I don't really see the relationship... Why would we want to use DOMStringList for this? -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] HTMLMediaElement.volume
On Fri, 23 Mar 2007, Anne van Kesteren wrote: Wouldn't it be better if no INDEX_SIZE_ERR was raised but instead the previous value was retained? For consistency with CanvasRenderingContext2D.globalAlpha for instance. It's not really important, but I think that some consistency between the various APIs would be nice. In general, actually, raising INDEX_SIZE_ERR is what the APIs do. So for consistency, volume is correct. globalAlpha, though, is not. What do people think? Should we change the canvas globalAlpha attribute to raise an exception for out-of-range values? Any browser vendors have an opinion? -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] datetime - dateTime
On Sat, 24 Mar 2007, Anne van Kesteren wrote: The dateTime DOM attribute is spelled with an uppercase T: http://www.w3.org/TR/DOM-Level-2-HTML/html.html#ID-79359609 Fixed. Thanks. On Sun, 25 Mar 2007, Nicholas Shanks wrote: On 24 Mar 2007, at 16:57, Anne van Kesteren wrote: The dateTime DOM attribute is spelled with an uppercase T: http://www.w3.org/TR/DOM-Level-2-HTML/html.html#ID-79359609 I just encountered that while implementing longdesc support. The IMG attribute is lower-case, the DOM attribute is longDesc. At least they are consistently inconsistent :-) Indeed! -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Apply script.defer to internal scripts
On Tue, 27 Mar 2007, Kristof Zelechovski wrote: I understand that the async attribute must depend on the src attribute because it is needed and meaningful only when the script element is loaded from an external source; however, the advantage of using the defer attribute is not limited to that case. Consider the following example: script type=text/javascript defer function ha8validate(p5event) { return true } document.forms[0].onsubmit = ha8validate /script The script embedded here is so short and specific that it makes no sense relaying it to an external location; however, if the script is not deferred, the script fails with an exception at run time because the document body is not constructed yet. Therefore, the defer attribute can be meaningful without the src attribute and the dependency should be removed. I have removed the dependency. You can now specify defer even without the src attribute. I've also removed the restriction for async, because you might want to run a set of scripts in a particular order, with one of them being external and async, and another being internal. The only way to guarentee the internal one runs immediately after the external one is to make the internal one async too. On Thu, 29 Mar 2007, Gareth Hay wrote: Does it not follow that to be more consistent, logical, better style, whatever. you should wrap your code in a function that is called onload? Isn't that what onload is for? being triggered after the page has loaded? This doesn't preclude us allowing the other. On Thu, 29 Mar 2007, Alexey Feldgendler wrote: How is this better than putting the script immediately beefore /body, which already works today? It might not be better, but that's not a reason to disallow it. On Tue, 3 Apr 2007, Hallvord R M Steen wrote: There is no real advantage to the defer attribute since in HTML4 it is only advisory, the UA is not required to actually defer the script execution, and some implementations only defer it until seeing the next SCRIPT element in the source. Relying on it the way your use case does will work in very few browsers, and specifying this in HTML5 would increase backwards incompatibility for a very minimal gain. HTML5 defines it exactly. On Tue, 3 Apr 2007, Stewart Brodie wrote: My implementation will execute the script immediately if it was inline, and execute it as soon as the whole script is available (obtained from filesystem/network) otherwise. As far as I understood the specification, the DEFER simply indicates to the HTML parser that it can continue parsing the HTML without waiting to see if the script is going to insert additional content - i.e. the script will not use document.write (and friends). HTML5 defines this more exactly than HTML4. Cheers, -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Apply script.defer to internal scripts
On Thu, 29 Mar 2007, Matthias Bauer wrote: What about the DOMContentLoaded event? It is supported by Mozilla and, apparently, Opera 9. Dean Edwards has a technique to make it work on IE, and jQuery supports it on Safari [1]. Is there any chance DOMContentLoaded will be part of HTML5? On Thu, 29 Mar 2007, Dean Edwards wrote: Seems to have been forgotten: http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2005-April/003709.html It wasn't forgotten. The spec defines it now. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] A few editing suggestions for the HTML5 spec
On Sun, 15 Apr 2007, Geoffrey Garen wrote: Some of the algorithms in this specification, for historical reasons, require the user agent to pause until some condition has been met. While a user agent is paused, it must ensure that no scripts execute (e.g. no event handlers, no timers, etc). User agents should remain responsive to user input while paused, however. How should a user agent respond to user input that would cause an event handler to fire, like clicking on a button? On Sun, 15 Apr 2007, Kristof Zelechovski wrote: Pressing a button when the user agent is in paused state should cause the button to remain pressed until the user agent wakes up and execution of the associated event handlers should be deferred. On Sun, 15 Apr 2007, Geoffrey Garen wrote: Pressing a button when the user agent is in paused state should cause the button to remain pressed until the user agent wakes up and execution of the associated event handlers should be deferred. So, if I had N buttons in a page, does that mean that all N could potentially end up in a pressed state? On Sun, 15 Apr 2007, Kristof Zelechovski wrote: Methinks, if several buttons are pressed, the events should be placed in the queue and executed in the order of appearance. If the page is reloaded as a result of an event handler, all remaining events should be discarded as usual. I am not quite sure how to handle this situation because the user could end up pressing all of the buttons out of impatience in search for a button that works. It is irrelevant if the first button pressed causes a reload, otherwise the result may not meet the user's expectation. The user agent should indicate its busy state with an hourglass pointer, a status message, an animated icon, whatever in order to prevent this misunderstanding. On Sun, 15 Apr 2007, Kristof Zelechovski wrote: Yes, they could, just like storey buttons in the lift. This seems like exactly the kind of thing that we should leave up to the user agents, so I haven't specified anything. I don't really see that, from an interoperability perspective, it really matters. If there's a case in which it matters, though, do let me know so we can specify it. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] activeElement
On Thu, 17 May 2007, Hallvord R M Steen wrote: if WHATWG is defining document.activeElement, perhaps the WHAT spec should match IE's behaviour more closely on some points. I refer to: http://www.whatwg.org/specs/web-apps/current-work/#activeelement * when the document is loaded, before any interaction activeElement is the body element (!) (probably not important, I doubt any site would rely on this) I've made the default the body element instead of the root element, which does indeed seem to more accurately reflect IE's behaviour. * activeElement is set after mousedown. (important, maybe implied by other stuff about focus handling? I didn't test keydown for e.g. tabbing but pretty sure the same applies.) This flows from the fact that that's when focus is set. It would vary on a platform with different focusing semantics. * it is set to the event's target if it is focusable (A, INPUT, BUTTON etc.), otherwise it is set to the event's target's .offsetParent (important, and the offsetParent stuff isn't covered in the current spec) I couldn't reproduce this. In my testing, only positioned div and span elements were magical in this way. For example, click the uuu text on this test case and you'll see the offsetParent (written to the log) is the B element, but the activeElement is the DIV element. http://software.hixie.ch/utilities/js/live-dom-viewer/?%3C%21DOCTYPE%20html%3E%0D%0A%3Cbody%20onclick%3D%22w%28event.srcElement.offsetParent.tagName%29%22%3E%0D%0A%3Cpre%3E%28...%29%3C/pre%3E%0D%0A%3Cdiv%20style%3D%22position%3Aabsolute%22%3Eddd%3Ci%3Eiii%3Cb%20style%3D%22position%3Aabsolute%22%3Ebbb%3Cu%3Euuu%3Cinput%3E%3C/u%3E%3C/b%3E%3C/i%3E%3C/div%3E%0D%0A%3Cscript%3E%0D%0A%20setInterval%28function%20%28%29%20%7B%0D%0A%20%20%20var%20pre%20%3D%20document.getElementsByTagName%28%27pre%27%29%5B0%5D%3B%0D%0A%20%20%20pre.firstChild.data%20%3D%20document.activeElement.tagName%3B%0D%0A%20%7D%2C%20100%29%3B%0D%0A%3C/script%3E I haven't made the spec let positioned div and span elements get focused in this way, because whether an element is positioned or not should have no bearing on the semantics of the document. * it keeps pointing to the same element until another interaction with the document sets it again (important) That's already in the spec. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Scripting Tweaks
On Sat, 19 May 2007, Maciej Stachowiak wrote: May I suggest reproposing [DOMContentLoaded] for DOM 3 Events, then, since your former objection to it is withdrawn? I can if you want, but I don't really see it as a feature that would be expected in DOM3 Events. DOM Events defines the event infrastructure; it doesn't define when and how each event is actually fired. The firing of the events in HTML is very closely tied to the rest of the HTML processing model. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] setting .src of a SCRIPT element
On Wed, 30 May 2007, Jonas Sicking wrote: The reason I designed it this way was that it felt like the least illogical behavior. In general a document behaves according to its current DOM. I.e. it doesn't matter what the DOM looked like before, or how it got to be in the current state, it only matters what's in the DOM now. [...] For script things are a lot worse. If the contents of a script element is changed it is impossible to 'drop' the script that was there before. Once the contents of a script has executed, it can never be unexecuted. And since we can't undo what the script has already done, it feels weird to redo the new thing that you're asking it to do. Another thing that would be weird would be inline scripts. How would the following behave: s = document.createElement('script'); document.head.appendChild(s); for (i = 0; i 10; i++) { s.textContent += a + i + += 5;; } Would you reexecute the entire script every time data was appended to the script? Would you try to just execute the new parts? Would you do nothing? IE gets around this problem by not supporting dynamically created inline scripts at all, which I think is a really bad solution. So I opted for 'killing' script elements once they have executed, they become in effect dead elements. This felt simple and consistent. I'm not sure what you mean when you say you need to keep track of them, and remove them from the document again. All you need to do every time you want to execute a script is to insert a new DOM element in the head of your page. It's not going to be a problem with having too many script elements in the document unless you start executing millions of scripts, at which point you'll have bigger performance issues. On Thu, 31 May 2007, Jonas Sicking wrote: I don't see that being able to reuse elements adds any value. Could you give an example where it does? The global eval equivalent is an example. It's not much of an improvement over the cloneNode example but I'd like the performance to be as close to a plain eval as possible. Ability to switch type, charset, language attributes in chosen user agents may be useful for things like testing E4X support or ES4 support, or correct broken encodings. Ability to execute an external resource again may be useful. All of these are already possible however, so I don't think they are strong use cases. If there aren't any strong use cases I think we should go with what's simple. I agree with Jonas here (and I apologise for not seeming to have the other side of this conversation; I assume I put it into another folder and will get to it in due course). I haven't changed the spec, since the spec describes what Jonas says. Please let me know if you disagree with this, especially if you find pages that break because of it. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] typos in HTMLElement IDL
On Sat, 2 Jun 2007, Anne van Kesteren wrote: * tabindex - tabIndex Fixed. * contenteditable - contentEditable Fixed. * The irrelevant DOM attribute currently doesn't link because there's no dfn around its definition. Fixed. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] HTMLDocument.title and SVGDocument
On Fri, 1 Jun 2007, Maciej Stachowiak wrote: I basically see two options: HTMLDocument.title always wins, and you can get the other one using getFeature(), or, they both get redefined to check the root element and dispatch to the other one if appropriate. Suggestions? I like the check the root element option. That way, one could use the exact same Document object implementation for SVG and XHTML, while remaining compatible with expected behavior for existing content in both languages. Compound documents where both the HTML and the SVG have a title are possible, but that seems obscure enough that a special DOM API to get both titles is probably unnecessary. On Sat, 2 Jun 2007, Anne van Kesteren wrote: Even in that case only one title can be the document title. On Sat, 2 Jun 2007, Maciej Stachowiak wrote: If we define which is the document title in such cases than both HTMLDocument and SVGDocument returning that seems better than separate results. Done. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Scripting Tweaks
On Mon, 4 Jun 2007, Maciej Stachowiak wrote: I can if you want, but I don't really see it as a feature that would be expected in DOM3 Events. DOM Events defines the event infrastructure; it doesn't define when and how each event is actually fired. The firing of the events in HTML is very closely tied to the rest of the HTML processing model. It also defines the names of events Just referring to the event defines the name, so that's a non-issue. their associated IDL interfaces, whether they bubble, whether they are cancellable, and so forth. That is independent of the name, and determined when you fire the event. There are some events that sometimes bubble and sometimes don't, for instance. In fact I would say that it's up to the spec that fires the event to define whether they bubble, have default actions, etc. DOM 3 Events defines this for the load event for instance. I would argue that it shouldn't. It seems to me that load and DOMContentLoaded can both be defined in ways that are independent of the specific markup language, and are equally deserving of being in DOM 3 Events itself. Certainly in a user agent that supports multiple markup languages, you'd want DOMContentLoaded to be dispatched for all of them under the same conditions. I guess it makes sense to have a non-normative (because there's nothing to be normative about) repository somewhere that lists event names and which specs are using them, so that specs can remain consistent on the matter. But I don't think DOM3 Events need be it; it's a lot of work. For example, all the HTMLMediaElement events would need to be added to the list. Finally, the reason it was left out in the first place was largely due to presumed lack of use cases, not because it was believed to require language-specific processing rules. And that argument was accepted largely due to your support for it. So it would be good to at least present the new info. I really don't think that my input had much influence on the matter. This is the sum total of what I said about it: | All in all I'm in agreement with the sentiment on this thread that | DOMContentLoaded's use cases are unconvincing. ...and that still stands. However, a lot of people have indicated that there _are_ use cases, and that they really want support for this. Since browsers support it and aren't going to _remove_ support for it, we have to specify it, and hence it gets added to the spec. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Arbitrary HTML in option-elements
On Tue, 30 Nov 2004, Olav Junker Kjær wrote: Generally, I think its a very good thing that the spec tries to define how to handle invalid HTML. Undefined and optional behavior in interpreting HTML is bad thing IMHO. Maybe the rules for parsing invalid HTML (in HTML5) could be generalized, something like: [...] Since we now have very specific parsing rules, this probably no longer really applies. Please let me know if you disagree with what the spec says today about handling invalid HTML (and option elements in particular). -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Lowercase attribute values
On Sun, 28 Aug 2005, Henri Sivonen wrote: In XHTML there are attributes whose value must be in lowercase, although in HTML the value is case-insensitive. The most common example is the method attribute of the form element. But should rev and rel be lowercased? A piece of software that maps from the HTML flavor of HTML5 to the XHTML flavor and needs to decide which attribute values to lowercase. How should the decision be done? Based solely on the attribute name? (In which case 'type' would be interesting.) Based on both the element name and the attribute name? What is the recommended method for the author of such a piece of software for extracting the list of special cases from the spec? There are no more differences between XHTML and HTML now as far as this goes, as far as I know. Please let me know if I missed one. How should the lowercasing be performed? Using the locale-insensitive Unicode case data or for ASCII only treating non-ASCII as an error? So long as you don't do it in the Turkish locale, it should be fine. I haven't really made the spec very clear on this yet, but there's a red box about it; it'll be dealt with in due course. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Lowercase attribute values
On Mon, 29 Aug 2005, Henri Sivonen wrote: On Aug 28, 2005, at 09:56, Henri Sivonen wrote: How should the lowercasing be performed? Using the locale-insensitive Unicode case data or for ASCII only treating non-ASCII as an error? On further reflection, it seems to me the latter has to be chosen at least for a parser that is intended to be used as a part of a conformance checker. Otherwise input type=RADİO ... would pass. Good point. Ok, ASCII-only it is. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] how to handle minimised attributes in HTML5
On Wed, 27 Apr 2005, Henri Sivonen wrote: On Apr 27, 2005, at 13:09, Ian Hickson wrote: On Tue, 26 Apr 2005, Henri Sivonen wrote: What do you suggest the parser layer of an text/html conformance checker say about input checkbox ...? 1. Silently treat as input type=checkbox ...? 2. Treat as input type=checkbox ... but warn? 3. Treat as input checkbox=checkbox ... causing an error to be reported on a higher layer? 4. Treat as fatal error in the parser? 5. Treat as input checkbox= Why? XHTML requires boolean attributes to be represented as foo='foo'. If input checked ... was treated as input checked='' ..., one could not reuse XHTML schemas on top a minimal text/html flavor parser. XHTML no longer requires this. foo= and foo=foo are now defined equivalently. The only exception, I believe, would be for table border, which would instead be treated as table border=1. Do you mean table border should pass a conformance check? No. The border= attribute is not valid however it is written. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Character References in HTML
On Thu, 13 Oct 2005, Lachlan Hunt wrote: In HTML4, according to SGML rules, numeric character references in the range from #128; to #159; are defined as UNUSED, which makes them non-SGML characters. Strictly speaking, it's not an error to refer to these characters with character references (even the validator only issues a warning: reference to a non-SGML character); but, AIUI, neither SGML nor HTML4 assigns any meaning to them. http://lachy.id.au/log/2005/10/char-refs Technically, these character references should really refer to the Unicode control characters, but reality dictates otherwise for text/html, thanks to IE and countless (poorly written) books and tutorials. I, therefore, think the spec should say something along these lines: In HTML, numeric and hexadecimal character references referring to code positions in the range from 128 to 159 (0x80 to 0x9F) should be re-mapped to code positions in the Unicode character repertoire according to the CP1252 to Unicode table [CP1252]. This does not apply to XHTML. Done. (With a must, and with an explicit table, since CP1252 doesn't define all those characters.) HTML documents must not use numeric or hexadecimal character references in this range, although browsers should support them for backwards compatibility. Authors should instead refer to the correct Unicode code position for these characters. Done. Also, I think this would also be a nice conformance requirement to see for authoring tools: HTML Authoring tools should automatically convert these character references to either the equivalent Unicode code position or, if the file's encoding supports it, the character itself, according to the CP1252 to Unicode table [CP1252]. Not done, but it's redundant anyway since simply implementing the spec will do this automatically (the spec doesn't round-trip the out-of-range entities through the DOM). None of that should apply to XHTML, since XML explicitly allows this range in the production for Char and, as far as I'm aware, no XHTML UA implements this buggy behaviour. Indeed. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Test suite: Embedded content
On Tue, 29 Nov 2005, Lachlan Hunt wrote: At least in Gecko, we parse the contents of noembed, noscript, noframes, and iframe as CDATA when we're not going to be using their contents because in the past, we've had lots of problems with authors treating these tags like C's preprocessor directives, handling cases like: headnoscriptbody.../noscriptscript.../scriptbody is extremely difficult (and then preserving round-tripping for editor gets to be a problem, and the list of problems goes on). Ok, but how is equivalent markup handled in XHTML, where parsing obviously can't switch to CDATA? Badly. noembed is non-conforming and does nothing in XHTML. noscript is non-conforming in XHTML and does nothing in XHTML. noframes might be non-conforming. I haven't done anything with it yet. iframe contents are non-conforming in XHTML. They would be hidden but are in the DOM and active. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Serialize comments that contain --
On Mon, 30 Jan 2006, Simon Pieters wrote: What should happen with comments that contain -- when you serialize the DOM into HTML? When serializing into XML it should result in a fatal error according to DOM3Core[1], but I guess that is not really desired for HTML. Should the comment be dropped? Should the serializer insert a space in between (- -)? The spec says: # If the element's contents are not conformant, it is possible that # the roundtripping through innerHTML will not work. For instance, if # the element is a textarea element to which a Comment node has been # appended, then assigning innerHTML to itself will result in the # comment being displayed in the text field. Similarly, if, as a # result of DOM manipulation, the element contains a comment that # contains the literal string --, then when the result of # serialising the element is parsed, the comment will be truncated at # that point and the rest of the comment will be interpreted as # markup. Another example would be making a script element contain a # text node with the text string /script. It currently says this for innerHTML. Should I say it in other places too? Or would something else be better? -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] The problem of duplicate ID as a security issue
On Fri, 10 Mar 2006, Alexey Feldgendler wrote: Does the current version of the spec define what happens to elements with duplicate ID values? No. It's something we should consider for fixes to DOM3 Core, though. The problem of duplicate ID isn't just another issue where it's nice to have some well-defined error recovery just for uniformity. There are cases when duplicate IDs should be viewed as a security concern. Consider a script which augments the HTML page after it has been parsed by attaching event listeners to elements in the DOM tree, inserting new nodes into the tree etc. This is common practice, for example, for many web-based WYSIWYG editors. In this scenario, any method the script uses for identificaation of the DOM nodes subject to augmentation is vulnerable to possible spoofing by user-supplied content present on the same page. For example, imagine a script which finds a button by ID and attaches an event listener to it. A possible markup looks like this: div ...blog entry body... /div button id=addtomemoriesAdd this entry to memories/button script document.getElementById('addtomemories').addEventListener('click', doSomeNiceAJAX); /script So, a malicious blog author can make the following entry: I have found a a href=# id=addtomemoriescool website/a. Depending on how the browser handles duplicate IDs, any of the following unwanted effects may occur, or both: 1. Clicking the link in the blog entry adds the entry to memories list of the reader. 2. Clicking the real Add this entry to memories button does nothing. One can think of other examples, possibly more dangerous. Other methods of identification (by tag name, by class, by CSS selector as proposed recently) are also vulnerable. This kind of attack is hard to circumvent through use of HTML cleaners because id=addtomemories looks like an innocent attribute, like an anchor for navigation. It's not that hard to avoid. You can whitelist what attributes are allowed (e.g. only attribute consisting of comment followed by the comment number followed by 1 to 10 characters in the range a-z). Preventing such attacks by a HTML cleaner would require either making a full list of all forbidden IDs, class names etc, or imposing Draconian rules upon user-supplied content, completely disallowing such useful attributes like id and class. I'm not really convinced there's that much use in user-supplied IDs and classes, but the rules needn't be that draconian. The server could automatically prepend the commentN string to IDs and classes. To be safe, a server's cleaning code must whitelist everything -- elements, attribute names, attribute values, element contents, etc. It's not trivial, but that's no excuse for not doing it. Necessary but not sufficient. Duplicate IDs aren't caught by a validating parser, so custom code is needed to enforce many of the requirements. For example, if one was trying to ensure that all IDs are unique, then the ID values within the user-supplied code would have to be checked for duplicates among them, too. This is already the case, yes. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Parsing Numeric Character References
On Sun, 12 Mar 2006, Lachlan Hunt wrote: [The spec] does not cover [entities for] the characters in the range from #x80 to #x9F, which have historically been treated as code points from the Windows-1252 repertoire, rather than the control characters from Unicode. AFAIK, this is already interoperably implemented in all browsers. Fixed. Characters in the range from #x01 to #x19 (except for whitespace characters) are not treated interoperably across platforms. On Windows, Firefox, IE and Opera all displayed characters from some repertoire I couldn't identify. But on Mac: all the browsers displayed either nothing or a box (a place holder character). I think these should all return U+FFFD. They return the appropriate control characters from Unicode. The reason they render on some platforms is that the fonts on some platforms (Windows in particular) have glyphs in those positions. The use of characters in either of these ranges should be an easy parse error. I've made the first set a parse error, since those actually don't roundtrip as one mights expect. But the x01-x19 entities roundtrip fine, they just render funkily. We could define something special about these characters in the rendering section, but I don't think they should be parse errors. Do you agree? -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] why, e.g., input/@checked=checked ?
On Wed, 6 Jun 2007, Henri Sivonen wrote: Requiring lower case for the boolean attribute's canonical name (as value) certainly makes things friendly for clean and portable RELAX NG schemata and, thus, easier for me. It also makes things politically correct as far as XHTML5 goes. I can imagine, however, that someone else might see the case restriction as excessive. If you get feedback along these lines from your users, please let me know. We can review this in light of experience. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] The problem of duplicate ID as a security issue
On Thu, 7 Jun 2007, Alexey Feldgendler wrote: On Thu, 07 Jun 2007 00:20:18 +0200, Ian Hickson [EMAIL PROTECTED] wrote: Preventing such attacks by a HTML cleaner would require either making a full list of all forbidden IDs, class names etc, or imposing Draconian rules upon user-supplied content, completely disallowing such useful attributes like id and class. I'm not really convinced there's that much use in user-supplied IDs and classes, but the rules needn't be that draconian. The server could automatically prepend the commentN string to IDs and classes. IDs in user-supplied content are only useful as fragment identifiers for URLs, and mangling them like that defeats this use case because you don't know N before you post the comment, and therefore can't make internal links within the body (and it's also unobvious when you try to make links to parts of your article afterwards). True. I don't have a good solution to this that doesn't involve code on the server-side, though. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] On validation
On Thu, 16 Mar 2006, Henri Sivonen wrote: From the spec: The term validation specifically refers to a subset of conformance checking that only verifies that a document complies with the requirements given by an SGML or XML DTD. Conformance checkers that only perform validation are non-conforming, as there are many conformance requirements described in this specification that cannot be checked by SGML or XML DTDs. To put it another way, there are three types of conformance criteria: 1. Criteria that can be expressed in a DTD. 2. Criteria that cannot be expressed by a DTD, but can still be checked by a machine. 3. Criteria that can only be checked by a human. A conformance checker must check for the first two. A simple DTD-based validator only checks for the first class of errors and is therefore not a conforming conformance checker according to this specification. There are three things I don't like about this note: First, it perpetuates the Validation means only DTD validation mantra. Second, it mentions SGML and XML DTDs casually together. Third, it can be read to imply that using a DTD as part of a conformance checker is a good idea. Fixed. Let me know if it's still a problem. Suggested replacement text: Note: XML DTDs cannot express all the conformance requirement of this specification. Therefore, a validating the XML processor and a DTD cannot constitute a conformance checker. Also, since the two authoring formats defined in this specification are applications of SGML, a validating SGML system cannot constitute a conformance checker. I used basically this text. Since a large part of HTML5 involves aligning in the spec with the real world, perhaps the term HTML5 validation should be defined to mean the same as HTML5 conformance checking. :-) I have added a paragraph to this effect. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] [html5] tags, elements and generated DOM
On Thu, 16 Mar 2006, Henri Sivonen wrote: At the end of section 1.8 it says: These XML documents may contain a DOCTYPE if desired, but this is not required to conform to this specification. I'd like to see a note here. Something like this: Note: According to [XML], XML processors are not guaranteed to process the external DTD subset referenced in the DOCTYPE. This means, for example, that using entities for characters is unsafe (except for lt;, gt;, amp;, quot; and apos;). For interoperability, authors are advised to avoid optional features of XML. Added. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Forbidden characters in text/html
On Sun, 19 Mar 2006, Henri Sivonen wrote: Since U+ has no legitimate reason to be there just to get dropped, is any encounter of U+ a parse error? Yes. Fixed. The way the spec is written, U+000D does not occur in the character stream immediately before tokenization, but (as in XML!) it *can* appear in the tree construction stage, because an NCR can expand into U+000D. (I'm not suggesting any changes here--just noting how it is.) Indeed. Since U+000D can occur in the tree construction stage, I think the point under 8.2.2.3.7. How to handle tokens in the main phase that says A character token that is one of one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF), U+000B LINE TABULATION, U+000C FORM FEED (FF), or U+0020 SPACE should include U+000D as well. Good point. Fixed. On the other hand, I am wondering why the list of characters that implements the concept of whitespace in the tokenization and tree contruction stages includes U+000B LINE TABULATION and U+000C FORM FEED (FF). Are they required for backwards-compatibility? I would guess they do not actually show up on the Web that often. According to the W3C Validator, those characters do not need to be allowed for formal backwards compatibility with HTML4--on the contrary. http://validator.w3.org/check?uri=http%3A%2F%2Fhsivonen.iki.fi%2Ftest%2Fform-feed-in-tag.html http://validator.w3.org/check?uri=http%3A%2F%2Fhsivonen.iki.fi%2Ftest%2Fline-tabulation-in-tag.html I don't have an opinion about U+000B. What would you want changed? U+000C is allowed because converting text files to HTML can easily end up inserting FF characters. (e.g. RFCs have FF characters, conversion to HTML often leaves them.) I see no harm in allowing them really. In order to make all conforming HTML5 documents serializable as XHTML5, it would be necessary to have a catch-all restriction stating that a document is non-conforming if parsing it causes a non-XML character ( http://www.w3.org/TR/REC-xml/#NT-Char ) to appear in the DOM. For clarity, it would be nice to have the same restriction on the pre-parse character stream, but such a restriction is not strictly necessary for XHTML-serializability. I don't really think we can guarentee that all conforming HTML5 documents be serializable as XHTML5 anyway. I'm reluctant to add catch-all clauses, because they tend to have unexpected consequences. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] basefont
On Mon, 20 Mar 2006, Lachlan Hunt wrote: I'm just wondering how you're intending to deal with basefont? AFAIK, the only browser that supports it these days is IE, but it does so by breaking the DOM (I could be mistaken, but I think NN4 supported it too). Considering that no other modern browser supports it and that IE's DOM looks like this when base font is used: !DOCTYPE html titletest/title pbasefont face=Arial size=3test/p #comment: CTYPE ht HTML HEAD TITLE BASEFONT face=Arial BODY P #text: test BODY (shown as error) I think it should be made officially obsolete. It should be inserted into the DOM as an empty element, but UAs should ignore it. UAs may choose to support it at their own risk, but must not do so by breaking the DOM like IE does. Agreed. Done. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] The problem of duplicate ID as a security issue
On Thu, 7 Jun 2007, Alexey Feldgendler wrote: On Thu, 07 Jun 2007 00:42:31 +0200, Ian Hickson [EMAIL PROTECTED] wrote: IDs in user-supplied content are only useful as fragment identifiers for URLs, and mangling them like that defeats this use case because you don't know N before you post the comment, and therefore can't make internal links within the body (and it's also unobvious when you try to make links to parts of your article afterwards). True. I don't have a good solution to this that doesn't involve code on the server-side, though. Some form of sandboxing would be one. If sandboxing would solve it then I'll treat this issue as closed and deal with the sandboxing problems separately. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Still more comments and questions on Web Apps 1.0
elements It is strange and potentially confusing that the notion of top and bottom is reversed compared to the conventional use of those terms in connection to stacks. I agree. I am, however, reluctant to change it at this point, lest I make a mistake. 8.2.2.3.7. In the after head phase even white space implies the start of body. Is that intentional? This no longer appears to be the case. 8.2.2.3.7. The algorithms to be run on opening li, dt and dd are do not say anything about parse errors when elements whose end tag is not optional get popped. Those should, in my opinion, count as parse errors. Done. 8.2.2.3.7. The insertion modes pertaining to tables specify the handling of comment tokens as parse errors and the comments are inserted on the foster parent. Is that intentional? It looks like an oversight. This seems fixed now. Cheers, -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] id and xml:id
On Sun, 2 Apr 2006, Henri Sivonen wrote: Since UAs handle whitespace in the id attribute inconsistently (see below) Note that there is interoperability (in that, we have two browsers that do the same thing, and one of those is IE, even). old specs imply or require whitespace trimming Old specs imply or require a lot of things. ;-) and ids with whitespace are unreferencable from whitespace-separated lists of ids, True. I suggest adding the following language concerning document conformance: The value of the id attribute must be a string that consists of one or more characters matching the following production: [#x21-#xD7FF]|[#xE000-#xFFFD]|[#x1-#x10] (any XML 1.0 character excluding whitespace). I've made it non-conforming for an ID to contain a whitespace character. Also, I suggest requiring that elements must not have both id and xml:id and requiring that xml:id must not occur in the HTML serialization. (Again, from the document conformance point of view--not disputing requirements on browsers.) I don't really want to mention xml:id. If someone wants to write a spec that affects our spec, that's their business. I don't think it makes sense for us to go ahead and then ban their spec. That's not to say that xml:id is good or bad, it just doesn't seem relevant to mention it in our spec. If an element had both an id attribute and an xml:id attribute with different values, the document would not be HTML-serializable, which would be bad. That applies to any document that has nodes from other namespaces. xml:id isn't special in that sense. If an element was allowed to have an id attribute and an xml:id attribute with the same value, the following constraint from xml:id spec would be violated even for conforming docs: An xml:id processor should assure that the following constraint holds: * The values of all attributes of type “ID” (which includes all xml:id attributes) within a document are unique. ( http://www.w3.org/TR/xml-id/ ) I don't really understand what you mean there. Finally, as the ultimate ID nitpicking, the spec should state that it is naughty of authors to turn attributes other than id and xml:id into IDs via the DTD. (Well, using a DTD at all is naughty. :-) Again, if they want to do that, that's their business. I don't see that as a big problem. Test case: http://hsivonen.iki.fi/test/wa10/adhoc/id.html The script tries every id with a whitespaceless value to see if whitespace is trimmed before ID assignment. Safari and IE 6: id='a' PASS id='2' PASS id='lt;' PASS id=',' PASS id='auml;' PASS id=' c ' FAIL id='\nd\n' FAIL id='\t\te\t\t' FAIL id='#13;f#13;' FAIL That's what the spec requires today. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] apos; in text/html
On Tue, 25 Apr 2006, Henri Sivonen wrote: Should apos; be a valid charater reference in text/html? If not, what would be correct error handling? I went with making it valid, since it's valid in XML. That's problematic, because allowing it as a conforming entity reference does not add any expressiveness to the language but makes conformance checking less useful as an authoring aid (because apos; fails in IE and such failure could be trivially avoided). True, but I think having the predefined XML entities be a subset of the HTML ones on the long term is better, even if it does cause short-term minor pain. I think I'm going to emit a warning even if apos; is conforming. That seems reasonable. I am uncomfortable with LT;, GT;, AMP;, QUOT; and COPY; on aesthetic grounds, but at least they work interoperably. Yeah. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Common number formats
On Tue, 25 Apr 2006, Henri Sivonen wrote: I assume number formats in attributes consistently do not allow whitespace before and after. Am I right? The spec says (for unsigned integers): # A string is a valid non-negative integer if it consists of one of more # characters in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9). Since space characters aren't in the range U+0030 to U+0039, they are not allowed, whether at the start, middle, or end. I assume that an explicit + sign is always forbidden. Correct? The spec says: # A string is a valid integer if it consists of one of more characters in # the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9), optionally # prefixed with a U+002D HYPHEN-MINUS (-) character. Is the - sign forbidden in front of zero? (Would be logical considering that the explicit plus is forbidden.) The - is always allowed, per the text above. Most definitions of integers allow leading zeros but the width and height for canvas don't. Is this intentional? This has been fixed. I guess floats allow leading zeros as well. Correct? # A string is a valid floating point number if it consists of one of more # characters in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9), # optionally with a single U+002E FULL STOP (.) character somewhere # (either before these numbers, in between two numbers, or after the # numbers), all optionally prefixed with a U+002D HYPHEN-MINUS (-) # character. Do percentages allow leading zeros? (Leading zeros are harmless in Firefox, Opera and Safari, at least.) I haven't defined percentages yet, as we may not need them. But I imagine I would define them as being just a kind of integer followed by a % character. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Doctype conformance requirements
On Sun, 7 May 2006, Simon Pieters wrote: The conformance requirements section[1] states that: HTML documents that use the new features described in this specification and that are served over the wire (e.g. by HTTP) must be sent as text/html and must start with the following DOCTYPE: !DOCTYPE html. So, if I read this correctly, HTML documents that aren't served over the wire need not have a doctype? Fixed. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] script type= and style type= parsing
On Sun, 21 May 2006, Anne van Kesteren wrote: Based on http://testsuite.org/html/elements/script/001.htm and http://testsuite.org/html/elements/style/001.htm and the results in Internet Explorer, Firefox and Opera it seems parsing can be made pretty strict. The only real problem is script with the type= attribute set to the empty string. It seems all three browsers treat that as if it was an ECMAScript/JavaScript type, while in fact it is not. Internet Explorer handles the same situation with style correctly... Ian, perhaps you have statistics that show we don't have to worry about script type= and can make the specification to say that browsers must ignore the content in that case? By the way, I was planning on filing bugs on Mozilla for both the testcases, but couldn't find out what the right component would be. Anyone with ideas? On Mon, 22 May 2006, Joost 'AlthA' de Valk wrote: I've tested those two tests in WebKit nightly, and see that webkit fails a few more than firefox does. I would be very interested in some statistics as well :). The statistics are depressing. Anyway, I've defined (to some extent) the processing of type= and language=. I'm not really sure exactly what more to say. If we get into much more detail, we'll start having to define the difference between JS1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.5 with E4X, 1.6, 1.6 with E4X, 1.7, 1.7 with E4X, JScript, JScript.Encode, and VBScript. At least. I'm not sure we want to go there (mostly because I have no idea what we should say). -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Steps for finding one or two numbers in a string
On Sat, 9 Jun 2007, Kristof Zelechovski wrote: On Fri, 14 Apr 2006, Henri Sivonen wrote: I think i18n political correctness has no place in attributes. I think they should be ASCII only with the XML notion of whitespace. I agree and believe the spec already requires this. That statement was not precise enough. It applies to attribute names, not to attributes as such. I don't understand, could you elaborate? -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Steps for finding one or two numbers in a string
On Tue, 12 Jun 2007, Kristof Zelechovski wrote: Attribute names are limited to ASCII, attribute values are not. Neither are limited to ASCII. I don't understand. The discussion was concerning the numeric search algorithms for progress bars, not attribute names. What exactly are you requesting should change? -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Steps for finding one or two numbers in a string
On Tue, 12 Jun 2007, Kristof Zelechovski wrote: Attribute names are not and cannot be localized because they are for the software and not for the human reader. That means they are limited to ASCII whether the standard is specific about that or not. Ok... So the spec doesn't have to change? -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] script type= and style type= parsing
On Tue, 12 Jun 2007, Anne van Kesteren wrote: Hmm. I hope this will be defined by someone at some point. Well, at least the version switches that are important for interoparability. Me too. If you have an editor who has the time to work this out, the WebAPI WG is probably the best place for it. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Steps for finding one or two numbers in a string
On Tue, 12 Jun 2007, Kristof Zelechovski wrote: The specification enumerates all accepted element attributes. Neither of them transgresses ASCII boundaries. Since it can be directly inferred from the text, the explicit statement about that http://www.whatwg.org/specs/web-apps/current-work/#attributes0 technically is not needed, although it does no harm either. Ok. Since it does no harm and might help some readers, I'll leave it. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] [wa1] Status of tree construction section
that require additional behaviour (like IMG, LINK, META etc.). In Web browsers it's simply not an option. Having to fire mutation events for every mutation according to the complete DOM3 Events model is prohibitively expensive. To be honest, I've not found it a burden even on the sorts of low-end devices that our software runs (typically 300MHz CPUs, 8MB RAM, that sort of thing) Then again, I have a highly optimised event dispatcher that takes steps to minimise the work, particularly when there are no DOM listeners for the event being raised, which will almost always be the case for the events concerned (DOMNodeInserted and DOMNodeInsertedIntoDocument and the Removed counterparts). The internal default event handlers have similar filtering to eliminate any unnecessary processing quickly. Even minimal work is more than no work, and when you're dealing with thousands of elements, that's a big difference (in the order of milliseconds). In the in body section, WBR doesn't really belong with a,b,big,em... because it never had content. It probably ought to go in with area,basefont,bgsound... a bit further down, or in its own section. There's no real point bothering with putting it in the list of active formatting elements so it's coming off the stack again straight away. Fixed. Thanks, -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] [WebApps] Entity consumption
On Fri, 14 Jul 2006, J. King wrote: On Fri, 14 Jul 2006 18:53:31 -0400, Ian Hickson [EMAIL PROTECTED] wrote: On Fri, 14 Jul 2006, J. King wrote: There are two paragraphs at the end of section 8.2.1.1: # When an end tag token is emitted, the content model # flag must be switched to the PCDATA state. # # When an end tag token is emitted with attributes, # that is a parse error. They don't seem to make sense in context; are they editing artefacts? No, they're intentional... why don't they make sense? They're additional requirements on the tokenisation step. I was confused because I thought they belonged to section 8.2.1.1. I see now that they actually belong to 8.2.1, but it's kind of difficult to see that at a glance---8.2.1.1 is so long that the indention gets pretty lost. Perhaps the 8.2.1.1 section should be after those two paragraphs---or perhaps the two paragraphs should be moved to before the list of parsing states. The paragraphs seem to fit in better with the more general nature of the beginning of the tokenization section, I think. Ok, moved them up. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] [WebApps] Parsing: close tag open state
On Sun, 16 Jul 2006, J. King wrote: When the content model flag is set to RCDATA or CDATA in the close tag open state, the state machine is supposed to examine the next few character, and if they match the last start tag, also examine the next character to see if it matches whitespace. If these conditions are not true, then it is supposed to Emit a U+003C LESS-THAN SIGN character token, a U+002F SOLIDUS character token, and reconsume the current input character in the data state. However, there is no current input character; all the state machine has done so far is look ahead. Shouldn't it simply emit the two character tokens and switch to the data state? Fixed. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] About adopting quirks mode parsing
On Wed, 19 Jul 2006, Michel Fortin wrote: Le 18 juil. 2006 à 21:43, Ian Hickson a écrit : It might be desirable also that a valid HTML4 document gets a conforming HTML4 DOM. If it is, then ps shouldn't contain table. I agree. Is this goal compatible with blockquote, pre, ol, ul, and dl being structured inline-level elements? Let's take this valid snippet of HTML 4: pSome text ulliList item/li/ul According to HTML 4 parsers, I believe the DOM will be: P #text: Some text UL LI #text: List item Right. And for compatibility with legacy content, that's what HTML5 does too. But in HTML 5, where the list can be part of a paragraph, shouldn't the list be put inside the paragraph? Giving this DOM: P #text: Some text UL LI #text: List item Or should the list be put inside the paragraph only when you have an explicit closing p tag following the list (so that it becomes invalid HTML 4): pSome text ulliList item/li/ul/p ? Neither. As it says in 8.1.2.5. Restrictions on content models [1]: # A p element must not contain blockquote, dl, menu, ol, pre, table, or ul # elements, even though these elements are technically allowed inside p # elements according to the content models described in this # specification. (In fact, if one of those elements is put inside a p # element in the markup, it will instead imply a p element end tag before # it.) The new content models only apply to the DOM and the XML serialisations, they can't be expressed in the HTML serialisation. [1] http://www.whatwg.org/specs/web-apps/current-work/#restrictions -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] About adopting quirks mode parsing
On Wed, 19 Jul 2006, Simon Pieters wrote: From: Ian Hickson [EMAIL PROTECTED] On Mon, 17 Jul 2006, Simon Pieters wrote: As for an algorithm for how to do that, I think that an extra flag would be sufficient. If the parser hits !-- while in RCDATA or CDATA, the flag is set to true. Then, if the parser hits -- the flag sets to false. Initially the flag is false. While the flag is true the element can't be closed. It's slightly more complicated than that due to the whole problem with things like !---, but yes. You're right. I forgot about that. I've added more test cases (008-014, and 003-004 in rcdata)[1]. Opera never treats !-- as a standalone pseudo-comment. Firefox treats !-- as a standalone pseudo-comment for script, but not for title and textarea. IE always treats !-- as a standalone pseudo-comment. Safari treats !-- as a standalone pseudo-comment for style and script, but not for noscript, noembed and noframes. Now, I think that !-- should always be treated as a standalone pseudo-comment if !-- will be treated as a standalone real comment (in PCDATA), otherwise never. (If pseudo-comments really are needed, that is.) I've made the spec do what IE does (not counting conditional comments). I haven't looked at the parsing of comments in PCDATA mode yet but I'm guessing we'll have to support !-- there too. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] parsing: bogus comments - PIs
On Wed, 26 Jul 2006, Shadow2531 wrote: So, ?xml-stylesheet type=text/css href=? is a bogus comment. I *was* 100% sure that the PI should be parsed into: !--?xml-stylesheet type=text/css href=?-- Correct. Thanks Ian. Can you comment on innerHTML for this situation? If ?xml-stylesheet type=text/css href=? is parsed into !--?xml-stylesheet type=text/css href=?-- , what should innerHTML show? Assuming you mean the .innerHTML of a parent element, it would show the comment as you've written it above. See the innerHTML definition in the spec: http://www.whatwg.org/specs/web-apps/current-work/#innerhtml0 -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Events for added nodes while page is loading
On Tue, 1 Aug 2006, Robert Græsdal wrote: It'd be nice to have an event that'd tell my script when a new dom node have been added to the DOM tree /while it is loading/. Some documents just take quite a while to load, so it'd be nice to be able to modify nodes as they were added to the DOM tree. Browser vendors have told me that they don't want to do this due to the performance impact of such a feature. Otherwise, we already have this feature (DOM mutation events). -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] XHTML and document.write()
On Mon, 14 Aug 2006, Anne van Kesteren wrote: Just a FYI. You have to deal with the edge case that the root element might be html:script. Non conforming obviously, but what's supposed to happen should still be defined. I guess you would ignore calls to document.write() in such cases or perhaps copy the element and put it inside a html:html element and try again... Ouch! Not sure if nested html:script element would make things harder here... document.write() in XHTML is defined to raise an exception. There were simply too many edge cases that make no sense whatsoever for me to work out how it could work. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] innerHTML and QNames
On Tue, 3 Oct 2006, Simon Pieters wrote: On getting .innerHTML the spec says that the tag name is used to serialize tags. However, Opera and Firefox use the local name. Also, it isn't certain that element names and attribute names will be all lower-case. Fixed, as per our discussion on IRC. I took the opportunity to clean up the use of the term tag name in a few other places where it was ambiguously used. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] The problems with namespaces in text/html (Was: MathML-in-HTML5)
On Mon, 9 Oct 2006, Robert wrote: In browsers today, the following: a href=test xmlns= ... /a ...is just a link. If we start supporting xmlns= as it works in XML, but in HTML, then literally millions of pages are going to suddenly have their links stop working, because a in the namespace (as opposed to the XHTML namespace), is not an HTML a, and thus isn't a link. How about defining a standard namespace _prefix_ for such additions to HTML? As far as I've seen, all browsers interpret the namespace prefix as part of the tag/attribute, such that for MATHML in HTML, you'd use math:add. It'd require the author use the prefix for all relevant tags, but it should work without changing anything fundamental in UAs that might break other sites. As far as I'm aware, since namespaces don't exist in HTML there's nothing particularily evil about this. On Mon, 9 Oct 2006, Anne van Kesteren wrote: This seems much more annoying to author than the proposed alternative. It's not like we'll have millions of elements to be used in HTML one day. (I hope not, at least!) The language should remain relatively simple. I'm not even sure why people suggest SVG should be included as well as that's a presentational language. It makes much more sense to bind SVG to elements using XBL. I tend to agree with Anne. It's not clear to me what the advantage of the proposed solution would be. It's not really clear to me what the problem is, even. Cheers, -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Map lang to xml:lang at the parser level
On Sun, 15 Oct 2006, Simon Pieters wrote: When parsing HTML and serializing as XML you normally want to change the lang attribute to xml:lang. But why not put it in the XML namespace at the parser level? Then when you serialize the DOM as XML it becomes xml:lang automatically. The .lang DOM attribute would reflect xml:lang. This would make it simpler to set/get the language with script in XHTML (no need to use namespace-aware methods). I don't know if this is too expensive on the parser or if there are other flaws but it's just an idea. It's an interesting idea but it isn't really compatible with what legacy UAs do, since they would expose the attribute as 'lang' but this would require them attribute to be fetched using getAttributeNS instead of getAttribute to get the same effect. There are enough other subtleties in the differences between HTML5 and XHTML5 that I think you'd have to have special code to convert between the two anyway. So I'm not sure this would gain you much. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] innerHTML in XML
On Fri, 27 Oct 2006, Anne van Kesteren wrote: foo bar/ bar/ /foo How can foo.innerHTML be well-formed here? On Sat, 28 Oct 2006, Lachlan Hunt wrote: Anne van Kesteren wrote: foo bar/ bar/ /foo How can foo.innerHTML be well-formed here? It could be if it were treated as an external parsed entity. I've made the spec explicitly require that innerHTML return an XML namespace-well-formed internal general parsed entity representation. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] XHTML5 DOM building and IDness
On Thu, 2 Nov 2006, Henri Sivonen wrote: The spec says: The rules for parsing XML documents (and thus XHTML documents) into DOM trees are covered by the XML and Namespaces in XML specifications, and are out of scope of this specification. However, the spec says the following about the id attribute: If the value is not the empty string, user agents must associate the element with the given value (exactly) for the purposes of ID matching (e.g. for selectors in CSS or for the getElementById() method in the DOM). [...] there is a piece of code somewhere between the XML processor and the resulting DOM tree that is analogous to an xml:id processor and that assigns IDness to attributes that are not in a namespace, have the local name id and belong to elements in the XHTML namespace. Right, that piece of code is the XHTML UA. Is that a problem? Why would the rules resulting from HTML element semantics have to be dealt with by the lower level layers? The second quote implies that the first quote is not the full story and building a DOM tree from an XHTML document byte stream is not entirely covered by the XML and Namespaces in XML specifications [...] Not entirely is a polite way of putting it. There's a huge gaping whole between the XML spec and the DOM spec, with no actual definition anywhere that says how you get from one to the other -- there's no equivalent of the HTML parser spec for XML/DOM. It's only because for most things there's an obvious mapping that the implementations are interoperable, IMHO. This is one reason why I've punted on defining document.write() for XML -- without a strict parser spec that defines at which stage the DOM is updated, there's no clear definition of how you insert things into the parser's input stream, for example. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] 9.2.2: replacement characters. How many?
On Fri, 3 Nov 2006, Elliotte Harold wrote: Section 9.2.2 of the current Web Apps 1.0 draft states: Bytes or sequences of bytes in the original byte stream that could not be converted to Unicode characters must be converted to U+FFFD REPLACEMENT CHARACTER code points. I'm concerned about the or. For example, suppose there are six upper halves of a Unicode surrogate pair in a row and no lower halves. Does that turn into six replacement characters or one? Both interpretations seem possible. I suppose I prefer six rather than one, but I don't care a great deal as long as this is locked down one way or the other. I don't really know how to define this. I'd like to say that it's up to the encoding specifications to define it. Any suggestions? -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Typo in 9.2.3
On Sun, 5 Nov 2006, Elliotte Harold wrote: Otherwise if the next seven chacacters are a case-insensitive match for the word DOCTYPE, then consume those characters and switch to the DOCTYPE state. chacacters -- characters Fixed. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Entity parsing
On Sun, 5 Nov 2006, �istein E. Andersen wrote: From section 9.2.3.1. Tokenising entities: For some entities, UAs require a semicolon, for others they don't. This applies to IE. FWIW, the entities not requiring a semicolon are the ones encoding Latin-1 characters, the other HTML 3.2 entities (amp, gt and lt), as well as quot and the uppercase variants (AMP, COPY, GT, LT, QUOT and REG). [...] I've defined the parsing and conformance requirements in a way that matches IE. As a side-effect, this has made things like naiumlve actually conforming. I don't know if we want this. On the one hand, it's pragmatic (after all, why require the semicolon?), and is equivalent to not requiring quotes around attribute values. On the other, people don't want us to make the quotes optional either. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Space characters
On Mon, 6 Nov 2006, Henri Sivonen wrote: On Nov 6, 2006, at 07:34, Ian Hickson wrote: On Sun, 5 Nov 2006, Henri Sivonen wrote: Is there a reason why the definition of space characters does not match the XML 1.0 and RELAX NG definition of white space (space, tab, CR, LF) but also includes (line tabulation and form feed)? Is the deviation from XML 1.0 needed for backwards compatibility with text/html UAs? I made the parser consider VT and FF as being whitespace based on, as I recall, a complete examination of every Unicode character's behaviour in the parsers I was testing. The definition of space characters matches the parser's behaviour for consistency. The definition of space characters doesn't affect the XML parser stage as far as I can recall, only attribute parsing and DOM conformance. The potential problem with it affecting DOM conformance is that it may have ripple effects to running XML tooling inside a browser engine. Gecko has an XPath implementation. Disruptive Innovations has created a RELAX NG implementation for Gecko. Running the schemas from syntax.whattf.org on a DOM inside Gecko would be interesting, since it would allow checking DOM snapshots modified by scripts. There may be other reasons to run XML machinery on an HTML DOM in a browser. Both XPath and RELAX NG assume that white space-separated tokens follow the XML notion of white space. Not being able to use the native XPath and RELAX NG notions of splitting on white space would be seriously uncool. Of course, a browser engine might get away with tampering with the XPath or RELAX NG notions of white space since the additional characters don't occur in XML. But does it make sense to inflict the cost of such tweaking on the XML parts of browser engines? Would there be serious compatibility problems if the HTML5 parsing algorithm required VT and FF to be mapped to space (after expanding NCRs) and the higher-level parts of the spec defined white space as space, tab, CR and LF? Well, I don't much care about VT, but I really think we should round-trip form feed. Consider, for instance, RFCs, which have form feeds. I don't like the idea of dropping them on the floor when you convert RFCs to HTML and back to text again. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Handling of illegal byte-sequences (typically in UTF-8)
On Fri, 24 Nov 2006, �istein E. Andersen wrote: Section 8.1.4: Bytes that are not valid UTF-8 sequences must be interpreted as [...] U+FFFD Section 9.2.2: Bytes or sequences of bytes [...] that could not be converted to Unicode characters must be converted to U+FFFD If I read this correctly, section 8.1.4 requires that an illegal UTF-8 sequence like F2 BF BF (the three first bytes of a four-byte sequence, obviously not followed by a continuation byte) be converted into exactly three U+FFFD characters (one for each byte), whereas section 9.2.2 also allows one single replacement character (and possibly even two) in this case (and permits an arbitrary number n of repetitions of the three-byte sequence to be replaced by any number of U+FFFD characters between 1 and 3n). I realise that the underspecification in section 9.2.2 may well be intentional, given that this section is not limited to UTF-8, but (quite possibly depending on the handling chosen) this can (more or less easily) be expressed in such a way that it applies to any encoding. Alternatively, a reference to an authoritative source would of course fulfil the purpose in the particular case of UTF-8 (if such a document can be found). [Currently, an alert reader might infer that the treatment indicated in section 8.1.4 would be preferable also in section 9.2.2, but such inference for consistency can hardly be expected.] On Fri, 24 Nov 2006, Henri Sivonen wrote: I'm inclined to think that interop in error situations doesn't need to go as deep as defining how many replacement characters (in the range 1...number of bytes in a faulty sequence) a character decoder has to emit. Apps may want to delegate character decoding to an outside library whose authors don't care about the details of HTML5. (For example, it appears that Safari is leaving this stuff to ICU.) Chances are that there's more value in being able to use a library than in getting a specific number of replacement characters on error. On Sat, 25 Nov 2006, �istein E. Andersen wrote: I agree. The current slight inconsistency should probably be amended by making section 8.1.4 more liberal rather than the other way round. Done. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] HTML syntax: space characters between attributes
On Tue, 28 Nov 2006, Simon Pieters wrote: The HTML syntax requires space characters between attributes, but the lack of space characters between attributes does not cause a parse error according to the parsing section. Attributes must be separated from each other and from the tag name by one or more space characters. I'd suggest either making it a parse error or change the syntax to make it optional. (But obviously it can't be optional when the preceding attribute is minimized or unquoted.) This was changed some time back to make the whitespace optional in most cases (except where it would otherwise be ambiguous). -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Parsing (and syntax): in unquoted attribute values
On Wed, 29 Nov 2006, Simon Pieters wrote: The parsing section says that in unquoted attribute values are a parse error and that it causes the tag token to be emitted. As far as I can tell does not emit the tag token in at least Firefox, IE6 or Safari. Is it intentional to emit the tag token here? (If it is, why?) If not, should it still be a parse error (and be disallowed in the syntax section)? I've removed special processing of . Note that the following cases no longer close start tags, despite them working interoperably in Safari and Firefox: divp div title p div title=p And the following two no longer close tags either (only worked in Firefox): div titlep /divp All of these were allowed in SGML, as I understand it. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] file URL is overspecified
On Fri, 15 Jun 2007, Kristof Zelechovski wrote: I understand that this is fixed by HTML 5 [...] Please don't bring up issues we've already fixed. :-) -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Minor linking addition to parsing section
On Tue, 5 Dec 2006, James Graham wrote: It would be useful if in section 9.2.4.4. The trailing end phase, phrases such as Switch back to the main phase and reprocess the token. linked to the part of section 9.2.4.3.6. The insertion mode that define which insertion mode the main phase should be in when making this switch. I'm not sure I really follow what you mean. You mean link to the sentence that reads If the tree construction stage is switched from the main phase to the trailing end phase and back again, the various pieces of state are not reset; the UA must act as if the state was maintained.? -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Windows-1252 entities
On Wed, 6 Dec 2006, Anne van Kesteren wrote: The section on handling entities should contain the following mapping: [...] ... mostly for legacy reasons. Let me know if the table in that section is what you wanted. Cheers, -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Windows-1252 entities
On Wed, 6 Dec 2006, Sam Ruby wrote: +1, though I would suggest a one change: 159: 376 // Yuml; The spec does indeed say this. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Parsing: first bit of Close tag open state
On Wed, 6 Dec 2006, Anne van Kesteren wrote: In the top part of the Close tag open state there's no mentioning of consuming the next input character and this is correct. However, then it goes on saying that you should reconsume the current input character in the data state. I think it makes more sense that to say that you just have to switch to the data state there. This got fixed last week I believe. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Content Model Restrictions on tabletr in HTML
On Wed, 6 Dec 2006, Simon Pieters wrote: From: Ian Hickson [EMAIL PROTECTED] Hm. Actually an optgroup start tag has to imply an /optgroup end tag for compatibility with browsers... spec fixed. Then nested optgroups as allowed in WF2 is just another thing that only works in XHTML5? How many sites would break if /optgroup wasn't implied here? All those that used more than one optgroup without mentioning /optgroup. I don't think that's uncommon. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Content Model Restrictions on tabletr in HTML
On Wed, 6 Dec 2006, Bjoern Hoehrmann wrote: * Ian Hickson wrote: No conformance criteria are broken if the user agent is assumed to have converted the document to a serialisable form by adding an appropriate tbody element and then serialised that. If the user agent has not, e.g. it shows a tree of what it thinks it serialised, and that tree doesn't have a tbody between the table and the tr, then the browser has violated A table element must not contain tr elements, under 9.1.2.5. Restrictions on content models. You are now arguing way outside the context and the draft. For example, the draft does not define what is a serialisable form, when we are to assume a user agent had performed such a conversion, or what it means when something thinks it serialised something; and we were talking about authoring tools, not arbitrary user agents and browsers. My tool did not serialize, by my definition of that word, any 'tbody' element, it would be incorrect to claim otherwise. I disagree, section 8.1. Writing HTML documents is exactly that (a definition of what is a serialisable form), and there is also the XML version (it is, presumably, well understood what it means to serialise a DOM to an XML instance). What you probably mean is when the authoring tool makes claims about the contents of the generated file. If it claims that the file contains a table element with tr child elements then it would be misbehaving, but not because table elements must not have tr child elements, but because there are no such elements in the generated file. But never mind, I certainly do not want to stop you from torturing people who wish to learn about HTML syntax. If you could provide constructive and less hostile feedback, I would be able to fix the spec. Unfortunately as it stands I actually don't know what you would like me to change in the document. Could I ask you to elaborate? I certainly would like to change the document to be more to your liking. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Content Model Restrictions on tabletr in HTML
On Wed, 6 Dec 2006, Henri Sivonen wrote: Side note: I considered the inference of a tbody harmless enough that the validation mode I describe as the text/html-compatible subset of XHTML5 allows tr as child of table when applied to XML. Is this a bad idea? I thought it wasn't particularly useful to flag trees with tr as child of table as something that would break if serialized as HTML5 and sent as text/html. It might break some scripts. Other than that, I don't think it's a big deal, no. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] in caption insertion mode
On Sun, 10 Dec 2006, Anne van Kesteren wrote: The Anything else case should probably trigger a parse error before reprocessing the current token. Why? Could you show a sample of markup that would go through this path and should trigger an error that isn't flagged? -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] innerHTML for HTML and plaintext
On Sat, 9 Dec 2006, Anne van Kesteren wrote: On Fri, 08 Dec 2006 22:57:07 +0100, Ian Hickson [EMAIL PROTECTED] wrote: The section If the child node is a Text or CDATASection node should include the plaintext element. plaintext in general isn't supported by the innerHTML spec -- for example, it would always introduce a new /plaintext element. Is that a problem? Yeah, the way the contents of a plaintext element node are returned has given Opera at least one interop issue. What did you settle on for the implementation? I don't really know how I can fix this, given that it is trivial for the plaintext element not to be the last element in the DOM. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] several messages about XML syntax and HTML5
On Sun, 10 Dec 2006, Thomas Broyer wrote: However, text/xml-script would result in a parse-error in HTML5 (if I understand section 9.2 correctly). I've removed the parse error. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] nobr is not an active formatting element
On Mon, 11 Dec 2006, Anne van Kesteren wrote: In current browsers nobr is different is treated differently from other elements. nobr1nobr2/nobr3 gives: E: nobr T: 1 E: nobr T: 2 T: 3 nobr1div2/nobr3/div gives: E: nobr T: 1 E: div T: 2 T: 3 This is quite different from b, strong, etc. and it probably has to be this way too because of site compat. I've tried to make nobr more compatible with IE (basically it implies a /nobr before itself). I'd be extremely interested in implementation experience for this. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] HTML syntax: comments before doctype and doctype sniffing
On Mon, 18 Jun 2007, Philip Taylor wrote: In Firefox 2: javascript:s='?';for(i=0;i1006;++i)s+=' ';window.location='data:text/html,'+s+'!doctype htmlscriptdocument.write(document.compatMode)/script' javascript:s='?';for(i=0;i1007;++i)s+=' ';window.location='data:text/html,'+s+'!doctype htmlscriptdocument.write(document.compatMode)/script' The first produces CSS1Compat, the second BackCompat. As far as I can tell, Firefox requires the doctype to be found when parsing [using standards-mode rules] the first 1024 characters (not bytes) from the first non-whitespace character, and then it reparses the whole document in quirks mode if necessary. Hm, indeed, how odd. (It doesn't happen if you have purely spaces there, you need some sort of content there first.) Still, I don't think we should try to duplicate this unless we have evidence that it really is needed, as I described in my previous e-mail. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] in cell should handle comments
On Tue, 12 Dec 2006, Anne van Kesteren wrote: I don't see why comments have to be processed as if they were in body. They should just be appended to the current node. Isn't that what happens if they're processed as if they were in body? -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Bug in Before DOCTYPE name state?
On Fri, 22 Dec 2006, Thomas Broyer wrote: 2006/12/22, Ian Hickson: On Thu, 21 Dec 2006, Thomas Broyer wrote: Why is the DOCTYPE marked in error in the former case? Because otherwise this document: !DOCTYPEH ...would emit a DOCTYPE that is not in error (since the token would be emitted before the bit at the end of the DOCTYPE name state). Doh! right. This changed recently, by the way, if someone could check that the spec still is indeed causing the right errors to be flagged that would be great. (I think it is, though some errors moved from the tokeniser to the tree construction phase.) In other words, why would !DOCTYPE html be in error while !DOCTYPE Html wouldn't? Both would be not in error, because of the sentence at the end of the DOCTYPE name state. OK, now understood (thanks you Simon for having enlighted me) Note that this is now handled quite differently. On Thu, 21 Dec 2006, Thomas Broyer wrote: But it also has this note, which is quite confusing: Because lowercase letters in the name are uppercased by the algorithm above, the HTML letters are actually case-insensitive relative to the markup. How is it confusing? I would clarify it, but I don't know what is confusing. Maybe there's no need to clarify it, it might just have been me… Ok. It remains that the tokenization stage is a bit confusing… Yes. The tree construction stage is even worse. Just implement it exactly as written with no interpretation and you should be fine. ;-) My problem is that I'm not implementing an emitting parser (à la SAX) but a pulling parser, so I'm stopping as soon as I've found a token and return true to say hey, I've changed the TokenType, Name, Value, etc. properties to reflect a new token. ...so I'm interpreting ;-) Re tree construction, I'm about to implemented it in two parts: in the pull parser when possible (handling omitted tags and misnested formatting elements) and in a tree fixer otherwise (move the meta and link into head, etc.) How has that worked for you? Is the spec ok for that approach? -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Ampersands not followed by ASCII letters or #
On Wed, 27 Dec 2006, Henri Sivonen wrote: I noticed that the Web Apps spec itself contains script samples with unescaped JavaScript operators in pre blocks. Considering that this is not an error in HTML 4.01 as SGML and considering that it is harmless in browsers, I think the top-level Anything else case under 8.2.3.1. Tokenising entities should be split in two so that there is also an error-free case for the ASCII characters that aren't '#', aren't ASCII letters and that weren't in error in SGML-based HTML. I don't have The Handbook at my disposal right now, but the error-free case should cover at least '', '' and space characters. I've allowed: U+0009 CHARACTER TABULATION U+000A LINE FEED (LF) U+000B LINE TABULATION U+000C FORM FEED (FF) U+0020 SPACE U+003C LESS-THAN SIGN U+0026 AMPERSAND EOF Let me know if you want any more added to the list. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] adoption agency parse errors
On Sun, 14 Jan 2007, Anne van Kesteren wrote: Is it correct that: !doctype htmlipb/i throws exactly two parse errors because of step 1, paragraph 3 of the adoption agency algorithm. (Since you need to iterate two times through it the /i will hit there twice.) Some of the submitted testcases from Google to html5lib assume a single error. My parser explicitly avoided reporting that particular error more than once per invokation of the AAA. Exactly how many parse errors are reported is up to the implementation, so long as it fits this criteria: # Conformance checkers must report at least one parse error condition to # the user if one or more parse error conditions exist in the document and # must not report parse error conditions if none exist in the document. # Conformance checkers may report more than one parse error condition if # more than one parse error conditions exist in the document. Conformance # checkers are not required to recover from parse errors. Generally speaking, from a UI point of view, you want to only report one error per correction that the user has to make. For example, if the user omitted a quote mark in a string in a script and that caused the string's contents to be treated as code, you'd ideally just want the compiler to say missing quote mark, not start listing all the contents of the string and say how each part in turn isn't valid script. Sadly, this is often quite difficult to achieve! -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] isindex prompt
On Tue, 20 Feb 2007, Anne van Kesteren wrote: I think the parsing algorithm should take the prompt= attribute of isindex in account. It replaces the string of characters placed before the input element with its contents. (In that case there will be no characters after the isindex element.) Done. On Tue, 20 Feb 2007, Anne van Kesteren wrote: Also, the prompt= attribute will not be on the input element afterwards. Done. On Tue, 20 Feb 2007, Alexey Feldgendler wrote: Are there any real-world uses of isindex remaining? Is this element worth thetrouble? Sadly, yes. On Tue, 20 Feb 2007, Martijn wrote: Also, there is an action attribute, so I think it would be wise to include that one too. Done. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] several messages about discouraged things
On Sat, 24 Feb 2007, Keryx Web wrote: Speaking from __my__ experience, and the experience of those (too few) colleagues that I've met in Sweden who teach standards based web development, it is hard too make the student understand that something is wrong if he/she get's away with it. Agreed. I would like the spec to clearly state what is allowed for backwards compatibility only and what is the preferred way of marking up content. The spec doesn't allow anything purely for backwards compatibility. I would like a spec that clearly says that some ways of marking up content is detrimental to accessibility and perhaps also usability. E.g. frames, including the iframe, or tables used for layout. You would not believe how many colleagues of mine who actually teach that frames are a good thing. My nephew, who studies i a nearby city, even had frames as a required feature of his work! Frames are out (except iframe, which I don't really see as being a problem, though let me know if I'm wrong on this). Tables for layout are non-conforming, though I hope to make this clearer in due course. I've added a note to myself in the spec to remind myself of this. On Sun, 25 Feb 2007, Keryx Web wrote: A few examples that I think is bad practice (99.9 % of the time it's used): - Inline styles The media-specific evils of style=, if it is allowed at all, will indeed be called out explicitly. - Empty p-elements, or p elements containing only nbsp; The former will be allowed, as there are valid use cases (usually involving script). I'd like to ban the latter, but I'm not sure how to do it. Any suggestions? - A table within a table cell (Has this ever been used for anything but layout?) There are valid uses of that, though they are rare. But layout tables in general will be discouraged (and are non-conforming). - Iframes Why are they bad? -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Using the HTML5 DOCTYPE as a new quirksmode switch
On Wed, 7 Mar 2007, Asbjørn Ulsberg wrote: (I sent this to the list already, but I think it didn't appear because I sent it with the wrong e-mail address.) I'm not sure if it has been discussed earlier, but after seeing Chris Wilson's talk on «Browser Wars Episode II: Attack of the DOMs»[1] I think it's pretty obvious that Internet Explorer needs a new switch of some sort, to be allowed to implement and fix the DOM, JavaScript, CSS1-3 etc. without breaking backward compatibility. At least that's what Chris Wilson says. As I explained on public-html, though, such a switch to introduce yet another rendering mode would, on the long term (with this practice repeated with each browser version, as Microsoft have indicated is their intention), prevent competition in the browser space. That's the worst possible outcome from a standards perspective. http://lists.w3.org/Archives/Public/public-html/2007Apr/0319.html (The word processor document space is an example of how bad things can get if we go down this road. It is basically impossible to compete with Word today because of the myriad of undocumented formats it supports.) And I agree. Internet Explorer needs a new switch. So I thought, what about using: !DOCTYPE html as the new switch? That would be, IMHO, disastrous. But, there's nothing we can do to stop Microsoft from inventing yet more rendering modes, nor anything we can do to stop them using !DOCTYPE html. We can, however, make it a violation of the specs, and indeed that has now been done (quirks mode and DOCTYPE sniffing is part of the spec). -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Using the HTML5 DOCTYPE as a new quirksmode switch
On Thu, 8 Mar 2007, Alexey Feldgendler wrote: Other browsers can also use !DOCTYPE html as an indication to stop applying certain hacks which make them diverge from standards in favor of interoperability with IE. I've specified DOCTYPE sniffing in the spec now, and that indeed covers this suggestion. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Using the HTML5 DOCTYPE as a new quirksmode switch
On Sat, 10 Mar 2007, Jorgen Horstink wrote: As far as I understand it, the new DOCTYPE switch is meant to 'tell' to browser the document follows the HTML5 specification. No, the new DOCTYPE is merely meant to trigger standards mode (as opposed to quirks mode). HTML5 is set up to be backwards compatible with HTML4 documents. The opposite does not hold. There must be at least one new DOCTYPE to 'tell' the browser HTML5 is being served. No, the opposite does in fact hold. It shouldn't matter whether the document is HTML5 or HTML4 or earlier; they all have the same processing rules, given by the HTML5 spec. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Using the HTML5 DOCTYPE as a new quirksmode switch
On Sat, 10 Mar 2007, Elliotte Harold wrote: What are those of us who wish to use XML tools on our documents supposed to use? We will need a real DTD at some point, to declare the entities if nothing else. We will not be able to use !DOCTYPE html. XML allows you to use, e.g.: !DOCTYPE html SYSTEM my.dtd This is an XML feature, though, unrelated to XHTML5. I know some browser-centric folks here just hate DTDs and schemas of any kind; but we will need them, even if the browsers don't. We will create and use them, even if there's no normative DTD in the spec. That's entirely allowed by XML, indeed. One thing that's struck me in working with the spec over the last few days is just how hard it is to follow the various content models, and how much simpler most of them would be to read if they were described in a RELAX NG schema or a DTD. Having spoken to people who have actually used RELAX NG to describe them, I'm not sure that it really would be easier. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Using the HTML5 DOCTYPE as a new quirksmode switch
On Sat, 10 Mar 2007, Robert Brodrecht wrote: On Mar 10, 2007, at 4:37 PM, Matthew Ratzloff wrote: The seem to serve the purpose. If there are two HTML 5 specifications, browser makers can come together to decide which one to support by default when no DOCTYPE is present. Developers who would prefer the alternate standard could use the appropriate DOCTYPE. Browsers render in quirksmode by default. That's been established. At this point WHATWG has already rejected DTDs in DOCTYPE and seems pretty set on not including it. I myself would rather have some type of versioning (DTD or otherwise) in the DOCTYPE. All I've heard from WHATWG is that they don't really even like the DOCTYPE. If browsers didn't use DOCTYPE as the standards mode switch, DOCTYPE probably wouldn't even be in WHATWG's HTML 5. I'm sure most people have heard the saying Choose your battles. Fighting for DTDs or some other type of versioning in the DOCTYPE in WHATWG's spec is not a fight that can be won as far as I can tell. Having some method to tell people what spec an author is using can be won. It's not that it's a fight that can't be won, it's just that the arguments I've heard from people about why they think we shouldn't have versioning information are more convincing than the arguments from those who think we should have versioning information. (The arguments against versioning appeal to evidence that having versioning actively harms the Web and threatens the ability for competiting browsers to exist; the arguments in favour tend to be more about solving theoretical problems. It's an easy choice, really.) If there is no versioning system, there is no way to specify an alternate standard. Whatever happens, there'll only be one successful HTML standard on the Web. We don't need a technical means to chose between them, the market will do that for us. In any case, we just have the one standard right now (since the W3C and WHATWG are working on the same document). -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Using the HTML5 DOCTYPE as a new quirksmode switch
On Sat, 10 Mar 2007, [EMAIL PROTECTED] wrote: On Mar 10, 2007, at 8:38 AM, Mihai Sucan wrote: There's no way to advertise the document as HTML 5, and it's certainly not the purpose of the specification to do so. This is a problem. It is especially a problem now that the W3C is working on their version of HTML 5. When I asked Ian Hickson how WHATWG would handle divergence in the W3C spec [1], he said he intended to make every effort to keep the two in sync. [2] While I appreciate his effort and I fully believe that he will do his best, we are dealing with a body (i.e. the W3C) who have a history of stubbornness and unwillingness to work with important members of the community. [3] The future is still undecided, but I don't think it is a good idea to operate under the assumption that the W3C will copy and paste the entire WHATWG HTML 5 spec. [1] http://blog.whatwg.org/w3c-restarts-html-effort#comment-2020 [2] http://blog.whatwg.org/w3c-restarts-html-effort#comment-2022 [3] http://meyerweb.com/eric/thoughts/2006/08/14/angry-indeed/ Right now the two groups are using exactly the same spec, byte-for-byte, just with a different header. Even if DTDs are non-normative and antiquated in the HTML 5 spec, it at least provides some method for authors to indicate their intentions. If my intention was to write a document conforming to HTML 3.2, I can use the HTML 3.2 DTD to tell anyone in the future that I was using a certain set of elements. Wouldn't simply the act of using those elements be enough to say which elements you used? If browsers pay no attention to DTDs, as WHATWG has said time and again, browsers must be rendering the latest and greatest markup. If in 50 years, the i element has been out of use for 40 years, and browsers stop rendering that element and validators throw errors on that element, the document still conforms to the DTD. It's not the author's fault that the document doesn't perform the way it intended. Ideally, the browser should care about DTDs. If you're arguing that browsers in 50 years should have two modes, one with the obsolete i element not supported, and one with the obsolete i element supported (to support old content), why wouldn't it be better to simply have one rendering mode, which supported i? The WHATWG HTML 5 spec provides no way to specify what version / fork of HTML the author intended to use. Even if browsers don't pay attention, I think it is a shame that there is no way to specify (if for nothing else, to future-proof documents). I blogged about this in more detail. http://robertdot.org/2007/03/08/html-5-whatwg-versus-w3c.html I don't understand why it matters what version you're using. Or, for that matter, how most authors are supposed to work out which version they're using. It seems the WHATWG is staunchly against DTDs, even if it has an appropriate use (e.g. emails in this thread talking about XML entities). It's perfectly possible to use DTDs and entities when using XML with XHTML5. Nothing in the spec prevents that. (Browsers somewhat prevent it, since they don't fetch DTDs, though.) I've mulled over this awhile. Since DTDs aren't normative in browsers, perhaps a link element with a rel=specification and an href=http://www.whatwg.org/specs/web-apps/current-work/; (for example) would be a new way to say, this is the specification I used to create this document. It is easier to remember than the DOCTYPE DTDs on pervious versions of HTML, and it is much more human-readable than DTDs. It addresses my concerns, and doesn't use DTDs. Why does it matter which spec you used? -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Getting .innerHTML and pre\n
On Mon, 19 Mar 2007, Simon Pieters wrote: The parsing section says that a linefeed character following a pre start tag token is dropped, and the syntax section says that when serializing a linefeed must be included if the pre starts with a linefeed. So far so good. However, getting .innerHTML doesn't add the newline. Thus, if you parse and serialize with .innerHTML several times you keep eating linefeeds from pre. I think this is a problem. Step 2 in the algorithm for getting .innerHTML, If the child node is an Element, should include something along the following lines (some after the Append a U+003E GREATER-THAN SIGN () character. paragraph): If the child node is an Element with a tag name pre then append a U+000A LINE FEED (LF) character. This will always add the linefeed even when it's not needed, but I guess that's fine. Fixed. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] innerHTML in HTML documents with multiple namespaces
On Tue, 27 Mar 2007, Thomas Broyer wrote: I'm actually wondering what is supposed to be tag name for an element which is not in the HTML namespace (e.g. created with document.createElementNS). Is it the localName or the tagName (qualified name, i.e. with prefix)? The tag name is fully qualified (as in tagName). In other words, what should document.body.innerHTML end with after this script: var svg_svg = document.createElementNS(http://www.w3.org/2000/svg;, svg:svg); document.body.appendChild(svg_svg); Should it end with svg/svg or svg:svg/svg:svg? (Firefox would have svg/svg) The spec requires the latter at the moment. Also, should the tag name be lowercased before inclusion in the output or the algorithm is just assuming the tag name of HTML elements have already been lowercased elsewhere? (Firefox keeps uppercase letters; elements created with document.createElement are HTMLElements and have their names lowercased at creation time; as described in the spec) Same questions with attribute names ;-) I think the spec has been clarified regarding this, let me know if it is not clear still. Cheers, -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Parsing: comment tokenization
On Sat, 7 Apr 2007, Anne van Kesteren wrote: The tokenization section should also handle: !-- !--- as correct comments for compat with the web. This means that ! shows -- and that !- shows --. These comments are not handled (though not conformant). On Sat, 7 Apr 2007, Nicholas Shanks wrote: Why on earth is this a good idea? IE7 does it. The assumption is that content therefore depends on it. AFAIK browsers and other HTML clients don't currently treat these as comments This seems to disagree with my research. [...] compelling them to do so will cause several problems: 1) Web developers currently expect things like !--5?-- to result in the comment greater than five?. Changing such expectations on a whim is harmful. It is not clear to me that this is indeed true. 2) A double HYPHEN-MINUS delimits comments within tags, this provides compatibility with XML and SGML and changing this needlessly in HTML5 will just complicate conversion. This, unfortunately, is impractical. (I say this despite having personally pushed for this for years.) 3) You claim compat with the web but don't provide any evidence to support that. Are there huge numbers of sites expecting !-- to represent a comment without content? Can such sites not be fixed instead of polluting HTML with additional rules? I'd rather have a handful of broken sites that their authors will fix than saying to the other 99% of authors hey, you can now do this and ending up with millions of broken sites. (I say broken, because they will not be backwards compatible with current or previous UAs) It seems that they will in fact be compatible; but I agree, we shouldn't encourage it. The spec makes them non-conforming. On Sat, 7 Apr 2007, Nicholas Shanks wrote: Even you must (begrudgingly?) admit that comments formatted as in your original post are not backwards compatible, even if they do reflect the state of modern UAs as you say. How can both those statements be true? I don't believe I am 'pretending' anything. Just stating that diverging further from SGML for No Good Reason is pointless. (And yes, supporting a few odd websites that do this already counts as not a Good Reason, websites can always be fixed!) Sadly, Web sites can't always be fixed. Many sites have been long abandoned and are no longer updated. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Parsing: /li should be ignored
On Sat, 14 Apr 2007, Simon Pieters wrote: For compatibility with IE the parsing algorithm should probably ignore /li tags. Test case for the above proposal: !doctype html style * { margin:0; padding:0; } ul { background:red; } li { background:lime; } /style ulli/liThis line should be green./ul I've thought this over and as much as I'd like to be compatible with IE on this, there are a number of issues with it. There's the way that every other browser doesn't do this, which makes it a very risky change. It also means there may not be an immediate need to do this, since browsers only tend to disagree with IE when doing so doesn't break much content. There's the problem that it makes it difficult to know how to handle things like: ullitest/li!--test--/ul It would also make future expansion difficult, too. This would have to be applied to dt and dd, and would make constructions like: x dt xx /dt xx dd xx li xx /li xx /dd xx /x ...have very different results than it appears. So, unless there's a strong reason to, I suggest we don't change this. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] web-apps/current-work/#datetime-parser
On Tue, 17 Apr 2007, Sam Ruby wrote: Step 25 If sign is negative, then shouldn't timezoneminutes also be negated? Fixed. Step 27 Shouldn't that be SUBTRACTING timezonehours hours and timezoneminutes minutes? My current time is 2007-04-17T05:28:33-04:00 The timezone is -4 hours from UTC. To convert to UTC I need to add 4 hours. Fixed. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] void elements vs. content model = empty
On Wed, 18 Apr 2007, ryan wrote: So, I was just trying to check my blog for HTML5 conformance [1] and ran into a conformance problem that I had trouble sorting out. The conformance checker said: 1. Fatal Error: End tag param seen even though the element is an empty element. Line 121, column 73 in resource http://theryanking.com/blog/ so, I went to http://www.whatwg.org/specs/web-apps/current-work/#param to see what the restrictions on param are. In that section it says: Content model: Empty. which brought up the question what's 'empty' mean?. In my mind, it could either be no content allowed or must be a void element (ie, no end tag). The content model only describes what conformance means at the DOM level, it doesn't affect the syntax. To tell if you're allowed to have a closing tag, you have to see the syntax section, where it says: # Void elements only have a start tag; end tags must not be specified for # void elements. ...and: # Void elements #base, link, meta, hr, br, img, embed, param, area, col, input -- http://www.whatwg.org/specs/web-apps/current-work/#elements0 It'd be nice if we could make this clearer in the spec– even though the language and html serialization are two different things, for the sake of authors it'd be nice to have pointers between the two. Yeah... I'm not really sure how to do that yet. I think on the long term I may add an informative block to the element definitions (the green boxes) that says something like: Syntax in text/html: Start tag may be omitted End tag must be omitted ...or whatever. Also, if there's a difference between content=empty and 'void elements' it deserves an explanation. One is just about the content model, the other is just about the syntax. They're not really related, though it happens to be the case that all elements that have an empty content model are void elements in HTML. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Parsing: should foodd/foo close the DD?
On Fri, 20 Apr 2007, Simon Pieters wrote: I sent a bug report to Opera saying that given the markup foodd/fooX, X should be a sibling to FOO instead of a child of DD. According to Anne the bug report was invalid per the current spec: On Fri, 20 Apr 2007 09:03:29 +0200, [EMAIL PROTECTED] wrote: I think this bug report is invalid. When you hit /foo dd is the bottommost node of the stack. dd is in neither the formatting nor phrasing category (it's in special) and therefore the /foo end tag is ignored. However, in IE, Firefox and Safari, the DD does get closed at /foo, so perhaps this is a bug in the spec? I could only get /foo to close the dd in Firefox. In IE, the foo is treated as a void element. Opera and Safari seem to follow the spec. Without further evidence that this breaks things, I'd rather just leave the spec as is. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Incorrect character codes
On Fri, 20 Apr 2007, Philip Taylor wrote: Section 8.2.3.1: U+0061 LATIN SMALL LETTER A through to U+0078 LATIN SMALL LETTER F, and U+0041 LATIN CAPITAL LETTER A, through to U+0058 LATIN CAPITAL LETTER F Should say: U+0061 LATIN SMALL LETTER A through to U+0066 LATIN SMALL LETTER F, and U+0041 LATIN CAPITAL LETTER A through to U+0046 LATIN CAPITAL LETTER F It seems this is fixed now. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Parsing: in unquoted attribute values
On Wed, 25 Apr 2007, Simon Pieters wrote: The parsing section says that in an unquoted attribute value terminates the tag. However, according to my testing[1], IE7, Gecko, Opera and Webkit don't do this -- they append the to the attribute value. So I think the parsing section is wrong here. This was fixed recently. Additionally, the syntax section says that authors are not allowed to use in unquoted attribute values, which should probably be changed if the parsing section is changed. Oops, forgot to fix that last time. Fixed now. On Wed, 25 Apr 2007, Anne van Kesteren wrote: IE also lets be an attribute. It can also be part of an attribute or element name. This means that: p/ptest will become a 'p' element with a 'p' attribute which has 'test' as textContent. This basically means less exceptions in the tokenizer for the '' character which would be fine with me. HTML5 requires this now. On Wed, 25 Apr 2007, Anne van Kesteren wrote: As I just mentioned on IRC, this essentially means removing the SHORTTAG TAGC OMISSION feature of SGML which appears not be supported by Internet Explorer, Opera and maybe Safari. Indeed. On Wed, 25 Apr 2007, Jonas Sicking wrote: p/ptest will become a 'p' element with a 'p' attribute which has 'test' as textContent. This basically means less exceptions in the tokenizer for the '' character which would be fine with me. We do no longer support this in mozilla (if we ever did). A reason we now explicitly forbid this is we don't want it to ever be possible to create elements with 'illegal' names. Same thing goes for attribute names. This is partially for security reasons since some elements and attributes carry very important security information. On Thu, 26 Apr 2007, Anne van Kesteren wrote: Could you elaborate on the security issues? Could you also give a definition of illegal names as it's not really clear to me what that means for HTML. On Fri, 27 Apr 2007, Jonas Sicking wrote: Basically, for input type=file value=/etc/passwd, if part of the code thinks that that is an input element, where as other parts thinks that is and input element, you might end up in a situation where the browser sends the /etc/passwd file to the server without user interaction. That seems a bit specious given that for type=file you'd have to ignore value= anyway. Furthermore, making the be _not_ part of the tag name is what causes the security issue, as it's only when you _don't_ put it in the tag name that you end up with an input element. Anyway, that's the advantage of having a single, well-defined tokeniser, you don't have to worry about differences in opinion. :-) It also seems like a bad idea to allow a document to be parsed such as there is no way to serialize it without creating an invalid html5 serialization. We are well past that point. Example: p bogus= ...can be parsed but can't be serialised legally. As far as element names go, i don't really see a reason to allow more, or less, characters than the XML spec lets you use. The main reason is that you have to define what happens to the characters you don't allow. We don't have the option of fatal failure. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Script, style and backwards compatibility
(Thanks for forwarding forum feedback to the list. Feel free to forward my reply back to the forums, and please do continue to forward feedback from the forums, or blogs, or anywhere else, to the list!) On Mon, 30 Apr 2007, Simon Pieters wrote: From http://forums.whatwg.org/viewtopic.php?t=38 Make noscript allowed in XHTML5 Unfortunately the way noscript works makes it impractical in XHTML. You can have similar effects, however, by just using script to remove the section: div class=noscript.../div script var n = document.getElementsByClassName('noscript'); for (var i = 0; n lt; n.length; n += 1) n[i].parentNode.removeChild(n[i]); /script ...or some such. (Untested.) and generally remove differences between HTML5 and XHTML5 where possible. Indeed, removing unnecessary differences is a goal (though it is not the most important goal, and so can be trumped; for example backwards compatibility would override it, as it does with noscript). This could thus also imply: * Don't disallow lang= in XHTML5 Having both xml:lang= and lang= would actually cause more round-tripping problems (if they were both allowed), since xml:lang can't be used in HTML. We can't drop xml:lang, though, since XML defines it. * Don't disallow base href in XHTML5. This is mostly disallowed because generic XML processing wouldn't know about it, and so URIs in unrelated languages like SVG would change meaning based on whether the UA knew XHTML or not. * Don't disallow meta charset in XHTML5 (it doesn't do any good, but doesn't harm either). We could allow it if we required that there be an XML declaration that had the same encoding specified, but then that wouldn't be the same as HTML5, so we wouldn't have won anything. On Mon, 30 Apr 2007, Simon Pieters wrote: Anne wrote: xml:lang should be treated the same as xml:id imo (except that for now I suppose they have different handling if both the xml: and normal attribute specified). Agreed. We can't treat xml:lang like xml:id. An element can have multiple IDs, it can't have multiple languages. In conclusion, while I agree with the principle of keeping XHTML and HTML as close to each other as possible, I don't think they're further apart than is actually necessary. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Parsing: don't move meta and link to head
On Mon, 21 May 2007, Anne van Kesteren wrote: Internet Explorer 7 and Opera 9 don't move meta and link to the head element during parsing (much like they don't do that for style). I think that's a good enough reason to change the parsing specification to match that behavior. Besides the fact that it is more sensible as the DOM and the original input stream are closer to each other. Done. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Parsing: ignore /head?
On Mon, 21 May 2007, Anne van Kesteren wrote: If we simply ignore /head there's no longer a need to append elements to the head element pointer. In fact, we can remove it. I'm not sure how much this would complicate conformance checking, but it would certainly be very nice not to have such strange appending rules for the limited set of elements that have that now (link, meta, style, base). This would screw up the placement of comments between head and body. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'