[whatwg] Parsing, syntax, and content model feedback
This is a bulk reply to a variety of e-mails on the topic of the HTML5 syntax, its parsing rules, and sent to the WHATWG list. On Sun, 27 Jul 2008, Henri Sivonen wrote: 2.3.1. Since blockquote is so abused that it is useless for AI, allowing attribution within the blockquote would be practical. Attribution isn't part of a quote. How would you distinguish quoting an attribution from quoting text with an attribution from quoting text that happens to have its attribution? Quotation marks: BLOCKQUOTEp�There�s just no nice way to say this: Anyone who can�t make a syndication feed that�s well-formed XML is an incompetent fool.��Maybe this is unkind and elitist of me, but I think that anyone who either can�t or won�t implement these measures is, as noted above, a bozo.� � A HREF=http://www.tbray.org/ongoing/When/200x/2004/01/11/PostelPilgrim;Tim Bray/A, co-editor of the XML 1.0 specification/p/BLOCKQUOTE I think if we were to allow this we would have to introduce an explicit credit element. Or maybe legend, but then why not just use figure to link the blockquote and its legend together? On the topic of foreignObject: Shouldn't HTML5 actually *disallow* html as a child of foreignObject and make the content model of foreignObject equivalent to the content model of body? The commented out SVG-in-text/html functionality doesn't support html as a child of foreignObject. Ok, done. (I'm surprised that SVG itself doesn't give any guidance as to the contents of foreignObject. I had no hooks to use to define this.) 2.3.4. When an element has an ID set through multiple methods (for example, if it has both id and xml:id attributes simultaneously [XMLID]), then the element has multiple identifiers. User agents must use all of an HTML element's identifiers (including those that are in error according to their relevant specification) for the purposes of ID matching. What does this mean in terms of document conformance? n/a (is the current text ok?) It's OK for the multiple ID issue. However, this sentence is known to be confusing: The value must be unique in the subtree within which the element finds itself and must contain at least one character. It should say that the ID must be unique within all nodes that are inserted into a document, a document fragment or an interconnected set of nodes that live outside a document or document fragment. I've tried to make it mean that. 2.9.10. I suggest the definition of i be changed to The i element represents anything that is italicized in conventional typography. That's pretty much the only real world-compatible definition. Also, I suggest b be included in the spec and defined as The b element represents anything (except headings) that is set in bold face in conventional typography. Is the current text ok? Yes, except the advice The i element should be used as a last resort when no other element is more appropriate. In particular, citations should use the cite element, defining instances of terms should use the dfn element, stress emphasis should use the em element may not be respectful of the authors' time in the absence of concrete benefits for justifying the advice (other than future potential for unconventional styling). Changed. On Thu, 4 Dec 2008, Tommy Thorsen wrote: Consider the following simple markup: !doctype html/br If I run it through my parser, which is implemented after the html5 algorithm, the resulting dom is as follows: html head body The br end tag is a bit special, and should be handled as if it was a br start tag. What happens here is as follows: The before head insertion mode will, upon receiving a br end tag, create a head node and switch to the in head insertion mode. in head will close the head node and move on to the after head insertion mode. I was expecting after head to see the /br and do like it does on a start tag, which is to create a body node and move to the in body state, but the /br is just ignored. I've changed my implementation of after head to handle /br just like the in head insertion mode, which is: An end tag whose tag name is br Act as described in the anything else entry below. This results in the following dom, for the example above: html head body br This matches Internet Explorer and Opera, but not Firefox and Safari. Then again, it looks like Firefox and Safari ignore all /br tags. Oops, this was an oversight. Fixed. On Thu, 4 Dec 2008, timeless wrote: if we're both able to get away with ignoring all /br tags, wouldn't the ideal forward path be to make always ignored? We can't remove this from quirks, and as others have pointed out, the fewer differences the better. On Thu, 4 Dec 2008, Henri Sivonen wrote: One option would be making the tokenizer check if an end tag has the name 'br' and turn
Re: [whatwg] XSLT and DOCTYPES
I haven't made any changes in response to the comments below. The XSLT-compat feature is available due to a number of requests, and I don't really see any harm in keeping it, even if not everybody needs it. I haven't changed the keyword to something else, mostly because I haven't found a better name yet. This is still an open issue. (XSLT-compat as a name is convenient, especially for non-XSLT purposes, precisely because it grates on people's sensibilities. Something neater would have less of a discouraging effect. So maybe we should keep the name as is regardless.) On Thu, 18 Dec 2008, Elliotte Harold wrote: I managed to miss this one when it went around the first time, but I really have to speak up now. The second half of 8.1.1 is unnecessary noise and complexity: For the purposes of XSLT generators that cannot output HTML markup without a DOCTYPE, a DOCTYPE legacy string may be inserted into the DOCTYPE (in the position defined above). This string must consist of: 1. One or more space characters. 2. A string that is an ASCII case-insensitive match for the string PUBLIC. 3. One or more space characters. 4. A U+0022 QUOTATION MARK or U+0027 APOSTROPHE character (the quote mark). 5. The literal string XSLT-compat. 6. A matching U+0022 QUOTATION MARK or U+0027 APOSTROPHE character (i.e. the same character as in the earlier step marked quote mark). In other words, !DOCTYPE HTML PUBLIC XSLT-compat or !DOCTYPE HTML PUBLIC 'XSLT-compat', case-insensitively except for the bit in quotes. Since XSLT 1.0 can generate well-formed XHTML without any problems, there really is no need for this at all. Documents generated by XSLT that need to be conforming should simply be XHTML. Furthermore, it is false that XSLT cannot generate an HTML 5 conforming DOCTYPE in HTML mode. As proof I present this stylesheet: ?xml version=1.0? xsl:stylesheet version=1.0 xmlns:xsl=http://www.w3.org/1999/XSL/Transform; xsl:output indent=yes method=html/ xsl:template match=/ xsl:text disable-output-escaping='yes'lt;!DOCTYPE HTML/xsl:text html /html /xsl:template /xsl:stylesheet and the following output: $ xsltproc test.xsl http://www.cafeconleche.org/ !DOCTYPE HTMLhtml/html This should work in any scenario in which the XSLT processor itself is serializing the output. If it's merely generating some sort of DOM or tree to pass to another process, then all bets are off. However in that scenario, other means of producing DOCTYPES are also not guaranteed since the DOCTYPE is not part of the XPath 1.0 data model. XSLT can promise a DOCTYPE only when it controls the serialization path, regardless of the technique you use to create it. Most importantly, does it really make sense to add ever more cruft not the spec to support every legacy tool and language out there? What if we discover that KR C won't do Unicode? or that some old versions of Java require tags to be upper cased? A spec like this should not be making special allowances for the languages that may be used to generate it. This time I will request a specific action: delete this section completely. It has no place in the spec. On Thu, 18 Dec 2008, Julian Reschke wrote: Elliotte Harold wrote: ... Since XSLT 1.0 can generate well-formed XHTML without any problems, there really is no need for this at all. Documents generated by XSLT that need to be conforming should simply be XHTML. ... Now if you can persuade Microsoft to implement XHTML, that might fly. Furthermore, it is false that XSLT cannot generate an HTML 5 conforming DOCTYPE in HTML mode. As proof I present this stylesheet: ?xml version=1.0? xsl:stylesheet version=1.0 xmlns:xsl=http://www.w3.org/1999/XSL/Transform; xsl:output indent=yes method=html/ xsl:template match=/ xsl:text disable-output-escaping='yes'lt;!DOCTYPE HTML/xsl:text html /html /xsl:template /xsl:stylesheet and the following output: $ xsltproc test.xsl http://www.cafeconleche.org/ !DOCTYPE HTMLhtml/html Doesn't work with Firefox' builtin XSLT engine which ignores d-o-e (and is allowed to do so). ... Most importantly, does it really make sense to add ever more cruft not the spec to support every legacy tool and language out there? What if we discover that KR C won't do Unicode? or that some old versions of Java require tags to be upper cased? A spec like this should not be making special allowances for the languages that may be used to generate it. This time I will request a specific action: delete this section completely. It has no place in the spec. ... I totally disagree. The spec also fails to mention that there are more use cases than XSLT; several HTML serialization methods share this restriction with XSLT's HTML output mode. Thus, the spec should continue to allow this, but pick a more correct name. On Thu,
Re: [whatwg] Use cases for Node.getElementById
On Tue, 2 Dec 2008, Aaron Leventhal wrote: Maybe there is a deeper problem if copy paste doesn't work right because of IDs? Or maybe there should be a node.getDescendantById() method? I haven't done anything with the feedback on this thread (not quoted), because it is unclear what the use cases are, and because it is really out of scope for HTML5. I would have recommended bringing this up in the context of the Web DOM Core work instead, if there are use cases of relevance, but since Simon already replied on this thread, that seems moot also. Cheers, -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] [rest-discuss] HTML5 and RESTful HTTP in browsers
On Mon, 17 Nov 2008 m...@mykanjo.co.uk wrote: I've read that HTML5 will be providing markup for the PUT and DELETE methods. This is definitely good news - but I considered something else recently that, from what I can gather, is not in the current spec for HTML5; markup for specifying appropriate Accept headers for requests. What problem would such a feature solve? I brought this up recently in #whatwg on freenode, and I was informed that this is not currently being considered since the equivalent can be achieved by a URL parameter such as '?type=application/xml'. Many would not Accept (pun intended - sorry) that this method was significantly different, some even went as far as to suggest (disturbingly) that serving multiple content-types from the same URI is undesirable! Indeed, content negotiation on the Web has not been a particularly roaring success, and it would probably have been better if we had avoided intoducing it, but that's an issue for another working group (and another era, probably -- we're likely stuck with it now). On Mon, 17 Nov 2008, Adrian Sutton wrote: I don't see why the Accept header when following links or requesting images should be controlled by anything other than the browser. It's the browser that has to decide actually render the returned content so it's in the best position to decide what it can accept, not the page author. That does seem like a valid point. On Mon, 17 Nov 2008 m...@mykanjo.co.uk wrote: as an example: a href=http://example.com/report;html report/a a href=http://example.com/report; Accept=application/pdfpdf report/a a href=http://example.com/report; Accept=application/rss+xmlxml report/a So I can send a colleague a message; 'you can get the report at http://example.com/report', and they can use that URL in any user agent that is appropriate. A browser is a special case in which many different content-types are dealt with. The same benefit is not achieved if the content is negotiated via the URL, since the user would have to know the type their user agent required and modify the URL accordingly: example.com/report?type=application/rss+xml To me, this is a much cleaner and more appropriate use of a URI. Not to mention more user-friendly. Something, I believe should be encouraged - this is why I feel it would be an important addition to HTML5. People do this today: a href=http://example.com/report.html;html report/a a href=http://example.com/report.pdf;pdf report/a a href=http://example.com/report.xml;xml report/a ...with the e-mail just saying: http://example.com/report ...and Apache's content-negotiation module working out the best file to return. This works today, what's the problem with it? (Other than theoretical purity concerns, which have been argued both ways here and are thus not a useful criteria to evaluate solutions by.) On Mon, 17 Nov 2008, Adrian Sutton wrote: The reason this is basically never used today is two fold: 1. It requires correctly configuring the server, beyond just putting files on the file system. Very few people actually do this. 2. It requires the user to see a URL and decide that they want to paste it into Acrobat instead of their browser, without any indication that it would actually work. Indeed. Content negotiation is not really compatible with the mental model people have of URLs (which is more similar to their model of files than to the model that URIs really represent). On Mon, 17 Nov 2008, Smylers wrote: m...@mykanjo.co.uk writes: So I can send a colleague a message; 'you can get the report at http://example.com/report', and they can use that URL in any user agent that is appropriate. Except that in practice on receiving a URL like the above, nearly all users will try it in a web browser; they are unlikely to put it into their PDF viewer, in the hope that a PDF version of the report will happen to be available. Indeed. A browser is a special case in which many different content-types are dealt with. It's also the most common case. Supposing I opened the above URL in a browser, and it gave me the HTML version; how would I even know that the PDF version exists? Suppose my browser has a PDF plug-in so can render either the HTML or PDF versions, it's harder to bookmark a particular version because the URL is no longer sufficient to identify precisely what I was viewing. Browsers could update the way bookmarks work to deal with this, but any exterrnal (such as web-based) bookmarking tools would also need to change. Or suppose the HTML version links to the PDF version. I wish to download the PDF on a remote server, and happen to have an SSH session open to it. So I right-click on the link in the HTML version I'm looking at, choose 'Copy Link Location' from the menu, and in the remote shell type wget then paste in the copied link. If the link explicitly
Re: [whatwg] Parsing, syntax, and content model feedback
Ian Hickson wrote: On Mon, 22 Dec 2008, Edward Z. Yang wrote: in the range 0x to 0x0008, U+000B, U+000E to 0x001F, 0x007F to 0x009F, 0xD800 to 0xDFFF , 0xFDD0 to 0xFDDFin the range 0x to 0x0008, U+000B, U+000E to 0x001F, 0x007F to 0x009F, 0xD800 to 0xDFFF, 0xFDD0 to 0xFDDF U+000B is not a range. While this is technically true, I don't really see a better way to phrase this that isn't verbose (e.g. ranges and codepoints or some such). If it helps, consider the whole set of subranges and code points to be a single discontinuous range, hence the use of the singular range. :-) The spec made me double-take when I read it (since it fairly clearly separates range from codepoints). Also, I messed up the copypaste while quoting, so the text I cited is not actually what's there, it's: in the ranges U+0001 to U+0008, U+000B, U+000E to U+001F, U+007F to U+009F, U+D800 to U+DFFF, U+FDD0 to U+FDDF, and characters U+FFFE... It seems fairly clear to me that U+000B should moved to the list of characters (at the cost of the nice ordering) or we should collapse ranges/characters into one range. On Tue, 23 Dec 2008, Edward Z. Yang wrote: You're still checking the next input character at that point, so P is still the next input character, so the next six are PUBLIC. At least, that's how I'm defending what the spec says. :-) The spec is pretty unambiguous about this: The next input character is the first character in the input stream that has not yet been consumed. Initially, the next input character is the first character in the input. and, at the beginning of the section: Consume the next input character: So, the spec is wrong. In practice I think having the text be clear (PUBLIC) is less confusing than having it be pedantic (P and UBLIC or this and the next five or some such). It's not like people are going to assume the spec is allowing XPUBLIC or *PUBLIC and so forth, right? I understand this consideration, and there's several ways we could go about doing this. I think the easiest would be to un-consume a character, and then perform the checks, and then reconsume the character. As for people making this mistake... well, you're looking at one. :-) Cheers, Edward (accidentally emailed only Ian; re-sending to WHATWG list)
[whatwg] Merry Christmas!
Probably you didn't notice, but it is 25th December today. Merry Christmas and Happy New Year to all members of WHAT and W3C working groups! Giovanni
Re: [whatwg] Merry Christmas!
Am Donnerstag, den 25.12.2008, 17:32 +0100 schrieb Giovanni Campagna: Probably you didn't notice, but it is 25th December today. Merry Christmas and Happy New Year to all members of WHAT and W3C working groups! I'm celebrating Isaac Newtons birthday, you insensitive clod ! ;D Holiday cheers -- Nils Dagsson Moskopp http://dieweltistgarnichtso.net
Re: [whatwg] Merry Christmas!
hey Giovanni, merry xmas and happy new year! :) Filippo On Thu, Dec 25, 2008 at 5:32 PM, Giovanni Campagna scampa.giova...@gmail.com wrote: Probably you didn't notice, but it is 25th December today. Merry Christmas and Happy New Year to all members of WHAT and W3C working groups! Giovanni
Re: [whatwg] Merry Christmas!
Filippo Levizzani schrieb: hey Giovanni, merry xmas and happy new year! :) Filippo Indeed! Frohe Weihnachten und einen Guten Rutsch ins Jahr 2009! Regards, Philipp Serafin
Re: [whatwg] Merry Christmas!
Giovanni Campagna ha scritto: Probably you didn't notice, but it is 25th December today. Merry Christmas and Happy New Year to all members of WHAT and W3C working groups! Giovanni Merry Christmas to you and to everyone celebrating Christmas! Happy and holy celebrations to everyone celebrating a religious festivity or anything else in this period! Happy holidays to everyone having holidays in this period for whatever reason! Alex -- Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f Sponsor: Proteggi la tua auto * Con Direct Line risparmi oltre il 30% sulla tua polizza! In più per te, 15% di extra sconto! Scopri subito lofferta! * Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8511d=25-12
Re: [whatwg] Parsing, syntax, and content model feedback
2008/12/25 Ian Hickson i...@hixie.ch We're very constrained by the legacy for text/html's syntax; sadly, usability concerns aren't really able to make us change the language. [...] The goal is not to guess what the author meant when the authors makes a mistake; the goal is to have interoperable, predictable, defined behavior for all input. [...] See below Could you elaborate on how spec design like XHTML modularisation has any impact on developers of Web applications? I was under the impression that the only benefit was in the development of other specs based on the modules (and that only if those needs happened to mesh with the particular modules picked). Because of XHTML Modularization, I can build a XML Schema of different markup languages, integrate and validate them, without overlapping or redundance. Yes, DTDs are not dead, just they're not used in browsers. I assume you mean in forms? 1) use XMLSchema datatypes It's unclear how XML Schema datatypes would work with HTML forms and how they would be better than what we have in forms in HTML5 now. XForms use XMLSchema datatypes (default or user-defined). Btw, I meant the first part of HTML5 spec (parsing of date - time - numbers - etc.) 2) additional DOM interfaces, which include HTMLElement - HTMLCollection - HTMLFormsControlCollection - HTMLOptionsCollection - DOMTokenList - DOMStringMap 2) you don't need HTMLElement: markup insertion, attributes querying can be done using DOM3Core (that in latest browser are even more performant as no parser is involved), events are far better handled by DOM3Events, styling is included by CSSOM you don't need collection either: just use appropriate DOMNodeLists, while for DOMStringMap you may use binding specific features (all Object are hash maps in ECMAScript3): it works this way even in HTML5 Both HTMLElement and collections are in DOM2 HTML (even DOM1 HTML). DOMStringMap is basically nothing but a binding-specific feature. I don't see your point XHTML2 is not backwards compatible, and was a big part of the motivation behind starting the HTML5 effort. Still, see below. Extensibility is an anti-feature -- we specifically don't _want_ people to extend HTML without working with the wider community. That way lies fragmentation of the language and lack of interoperability. Indeed, what little non-centralised extension HTML has seen -- spacer, blink, marquee -- has been widely decried as a disaster. XMLHttpRequest was invented by Netscape, now it is a W3C Technical Report (I don't remember what maturity level). The same with so called DOM level 0 (now HTML5) Extensibility doesn't mean proprietary extensions: it means that organizations other than WHATWG can collaborate and you are sure that they're extension don't break existing specifications. Also this affects new versions of current module: for sure XForms 1.1 or XMLEvents 2, when they'll be finished, won't break XHTML Structural or Embedded Attributes Module. structure, sectioning, grouping are the same; It's unclear why you think XHTML2's features in this area are better than HTML5's. Can you elaborate? are the same means that literally there is no difference between XHTML2 and HTML5 about structure, sectioning and grouping of text text is very similar: you don't have time, but you can have span datatype=xsd:date content=2008-12-21Today/span as in HTML5 you have time value=2008-12-21Today/time; Why is that better? It seems far worse. Because you can have date, but you can have email, phone number, zip postalcode, that is any data that have specific formats. Or if you are focused on semantic you use property instead of datatype, and you can put there whatever you feel like, not just times, progresses or range values. for progress and meter semantic you can use role attribute (for styling you always use CSS); That would have a terrible accessibility story as far as I can tell. Why? Implementations will look for elements with role=progress (imaginary) instead of progress element, and they will read/display/print/whatever the content attribute (same as time) or element's content. A part from performance, what is the difference between: document.getElementsByTagName(progress) and document.querySelectorsAll([role=progress]) or document.evaluate([...@role=progress]); (not sure about XPath syntax, btw) when searching for progress elements to get their semantics? The same obviosly with any element This is an area where we are mostly just constrained by legacy -- ins amd del are from HTML4, not new in HTML5. See below embedding is much more powerful as any element can be replaced by embedded content; This isn't more powerful, it's more buggy. Just compare object with img. Making things general is something that language designers often feel is a good way to solve many problem at once, but usually it just ends up not solving any of the problems
Re: [whatwg] Modal dialogs in HTML5
Ian Hickson schrieb: On Thu, 18 Dec 2008, Philipp Serafin wrote: I think it would be a good idea to spec this algorithm as well then. The algorithm I described is basically CSS' shrink wrap algorithm. But we can't really require it, as it assumes that the OS has windows. My desktop, for example, doesn't have resizable windows, it only has tabs (I use the 'ion' window manager). Well, you could still phrase it something along the lines of The size of a popup document's viewport SHOULD be calculated using the CSS shrink wrap algorithm... etc etc. All I really just want is to make sure browsers don't use today's implentations. If you open a popup/modal dialog today and don't specify a size, you usually end up with an arbitrary default size or even a full-fledged second browser window the size of your screen - both pretty ill-fitted for the use-cases of showModalDialog() IMO. Example: data:text/html,button onclick=window.showModalDialog('data:text/html,div style=\'width:1000; height:100; border: 1px solid black;\'Awkward-sized dialog box/div');Click me!/button You still might want to keep the parameter for back compat though and just mark it as deprecated. (Your algorithm would deform existing popup windows that assume their lengthy descriptions get line-wrapped automatically.) We'll probably define it in the obsolete APIs section in due course. cool! Regards, Philipp Serafin
Re: [whatwg] Modal dialogs in HTML5
On Thu, Dec 25, 2008 at 8:29 PM, Philipp Serafin phil...@gmail.com wrote: Well, you could still phrase it something along the lines of The size of a popup document's viewport SHOULD be calculated using the CSS shrink wrap algorithm... etc etc. as an embedder of a browser for a small device, i do *not* want such a requirement
Re: [whatwg] Audio canvas?
On Wed, 16 Jul 2008, Dr. Markus Walther wrote: I have noted an asymmetry between canvas and audio: canvas supports loading of ready-made images _and_ pixel manipulation (get/putImageData). audio supports loading of ready-made audio but _not_ sample manipulation. [...] Question: What do people think about making audio more like canvas as sketched above? I think such a feature would be quite interesting, but I don't think we should persue it in HTML5 at the moment. If this is something that there is interest in developing, I recommend bringing it up in the W3C Web Apps working group in conjunction with browser vendors. Experimental implementations would go a long way towards demonstrating what is possible and what should be specified. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'