Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?
On Tue, 2006-11-28 at 16:20 -0500, Sam Ruby wrote: I believe that I could modify my weblog to be simultaneously both HTML5 and XHTML5 compliant, modulo the embedded SVG content, something that would needs to be discussed separately. I think having /two/ different serializations of Web Forms 2.0/Web Applications 1.0 is bad enough. To try and cater to what's effectively a third serialization compatible with both parsing methods is to reinvent the XHTML 1.0 as text/html mess. Serializing to multiple formats from a single source is, I think, a better model. Especially as embedded content may need different treatment too. Lachlan's observations [...] on what it would take to change the popular WordPress application to produce HTML5 compliant output As blogging software goes, WordPress is pretty good. But then blogging software is generally atrocious when it comes to markup. Trying to design an (X)HTML spec for a group of PHP developers who think it's persuasive to bang on about their dedication to web standards while serving their project's non-validating XHTML 1.1 homepage as text/html is doomed to failure. Adapting WordPress to be an efficient creator of good HTML (4 or 5) would be a big job, probably entailing a total rethink at some levels. But hacking WordPress to output valid HTML 4.01 is by no means impossible. A method for removing the trailing slashes you're worrying about was posted to the WordPress support forum: http://wordpress.org/support/topic/76431 -- Benjamin Hawkes-Lewis
[whatwg] SVG and significant inline
Since the img element in the XHTML namespace counts as significant inline content, the svg element in the SVG namespace should probably, too. -- Henri Sivonen [EMAIL PROTECTED] http://hsivonen.iki.fi/
Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?
Benjamin Hawkes-Lewis wrote: On Tue, 2006-11-28 at 16:20 -0500, Sam Ruby wrote: I believe that I could modify my weblog to be simultaneously both HTML5 and XHTML5 compliant, modulo the embedded SVG content, something that would needs to be discussed separately. I think having /two/ different serializations of Web Forms 2.0/Web Applications 1.0 is bad enough. To try and cater to what's effectively a third serialization compatible with both parsing methods is to reinvent the XHTML 1.0 as text/html mess. Serializing to multiple formats from a single source is, I think, a better model. Especially as embedded content may need different treatment too. That was not the intent of my suggestion. I am suggesting that HTML5 standardize on *one* format. One that comes as close as humanly possible to capturing the web as it is practiced in all of its glorious and often quite messy detail. Those that wish to serialize the DOM in other formats are certainly free to do so, but those formats aren't HTML5. I do have an opinion on how embedded content should be handled, but I am trying to focus on one issue at a time. If you would like a preview, take a peek at: http://planet.intertwingly.net/ http://planet.intertwingly.net/top100/ http://golem.ph.utexas.edu/~distler/planet/ Those three planets take input from a number of frankly grungy input sources and consistently produce well formed XML that often contain embedded MathML or SVG content. You are, of course, free to explore those pages and others; but, for now, I would like to focus on one question: If HTML5 were changed so that these elements -- and these elements alone -- permitted an optional trailing slash character, what percentage of the web would be parsed differently? Can you cite three independent examples of existing websites where the parsing would diverge? Lachlan's observations [...] on what it would take to change the popular WordPress application to produce HTML5 compliant output As blogging software goes, WordPress is pretty good. But then blogging software is generally atrocious when it comes to markup. Trying to design an (X)HTML spec for a group of PHP developers who think it's persuasive to bang on about their dedication to web standards while serving their project's non-validating XHTML 1.1 homepage as text/html is doomed to failure. I'm pretty sure that the Mozilla home page was not created with WordPress, and I'm absolutely sure that the Microsoft home page was not. Conversely, if the major browser vendors have to chose between the web as it is commonly practiced, and a spec that doesn't reflect that reality, which one do you think they will chose? I'll argue that the choices aren't as black and white as either the question you posed above, or even the one that I did. No matter what the WHATWG spec says, each vendor will independently make a cost/benefit analysis as to how they should treat trailing slashes in elements like img. But before they do, this work group certainly can anticipate that question. What is the cost of accepting trailing slashes on elements which are always defined with a content model of empty, except when found in Attribute value (unquoted) state? What sites would be parsed differently based on this change? Are those differences in line with how existing browsers actually behave, or at odds with this behavior? - Sam Ruby
Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?
Robert Sayre [EMAIL PROTECTED] wrote: On 11/29/06, Lachlan Hunt [EMAIL PROTECTED] wrote: I do not think it's a good idea to make the trailing slash conforming. Although it is harmless, it provides no additional benefit at all and it creates the false impression that the syntax actually does something. It does do something, in systems that think they are using XML (whether they actually are is another matter). It's possible it will prevent many information-free validation errors, and give the HTML5 more credibility as a result. Warning people about img / in the validator is a waste of their time. It's not a good idea to confuse them any more by giving the impression that it works for some elements but not others. It's better to just say it doesn't work at all and forbid it in all cases. Better? This is an opinion, and it's not backed up by data. So far, it looks like Sam has the data on his side. People do it, and it tends to work interoperably. Except when it doesn't. For example, here's a fragment of hotmail.com's signup page, served as text/html. It's the only example I've come across to date: !DOCTYPE html PUBLIC -//W3C//DTD XHTML 1.0 Strict//EN http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd; html xmlns=http://www.w3.org/1999/xhtml; dir=ltr ... select id=iRegion name=pff010004 / script.../script /select ... The script just document.write's loads of option tags (it's the country menu). It's hard to know what the author thought was going on. Did they think it was XHTML and just got stymied by the server configuration? I'm still in favour of permitting the trailing slash, personally. -- Stewart Brodie Software Engineer ANT Software Limited
Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?
On Nov 28, 2006, at 23:20, Sam Ruby wrote: In HTML5, there are a number of elements with a content model of empty: area, base, br, col, command, embed, hr, img, link, meta, and param. If HTML5 were changed so that these elements -- and these elements alone -- permitted an optional trailing slash character, what percentage of the web would be parsed differently? Obviously, 0% with parsers that opt to implement the HTML5 parsing algorithm with error recovery as opposed to Draconian error handling-- except for the detail whether error-reporting parsers report an error or not. (In theory, this is an issue for non-browser UAs that opt to implement Draconian error handling. In practice, even my mostly Draconian parser treats this particular error as non-fatal, because it is so common and so easily recoverable.) The basis for my question is the observation that the web browsers that I am familiar with apparently already operate in this fashion, this usage seems to have crept into quite a number of diverse places, and all this is coupled with Lachlan's observations[3] on what it would take to change the popular WordPress application to produce HTML5 compliant output. WordPress is a soup-in-soup-out system that shouldn't be trying to produce the XML syntax in the first place. But now that WP is using it, the question becomes: which is more costly: asking the WP developers to change their system or to adjust the definition of conformance so that WP looks conforming more easily. Anyway, as Lachlan already pointed out, whether or not the useless slash should be allowed on elements whose content model is empty is not an issue of technical damage to parsing interoperability but about damage to the mental model of confused authors. So the cost to consider is the cost of the confusion. As a side benefit of this change, I believe that I could modify my weblog to be simultaneously both HTML5 and XHTML5 compliant, modulo the embedded SVG content, something that would needs to be discussed separately. I am against blurring the distinction between the XML serialization and the HTML serialization. The infamous Appendix C didn't bring about good things. Having a text/html serialization that is also parseable as XML doesn't work from the UA point of view, because reality requires UAs to parse text/html using an HTML parser. Now, since UAs can't use an XML parser for parsing text/html anyway, it becomes useless for content providers to ensure that their text/html content is XML- parseable. Restricting the XML syntactic sugar, such as the use of CDATA sections or foo/ vs. foo/foo on the application/xhtml+xml side would be wrong in principle, because it is wrong for a higher-layer spec to micromanage lower-layer syntactic sugar or, worse, give differences in syntactic sugar a difference in meaning. In practice, limiting XML details of the application/xhtml+xml serialization would be useless, because it is processed using XML processors which are required to support full syntactic sugar anyway. I think that your blog system is a special case. Considering that I have seen the Yellow Screen of Death on your blog, it appears that you aren't using an isolated serializer that could be swapped. However, the reason why your site works is that it is built vastly more competently than other systems that don't use an isolated serializer *and* because you are both the developer and the deployer and you care about these issues, you can and do fix bugs quickly. That just doesn't work with systems that aren't constantly managed by the developer. So no offense intended, but I think that what would work for you (or Jacques Distler) isn't generalizable. Rather, a warning to the effect of professional driver on closed road would be appropriate. :-) -- Henri Sivonen [EMAIL PROTECTED] http://hsivonen.iki.fi/
Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?
Lachlan Hunt wrote: Sam Ruby wrote: In HTML5, there are a number of elements with a content model of empty: area, base, br, col, command, embed, hr, img, link, meta, and param. If HTML5 were changed so that these elements -- and these elements alone -- permitted an optional trailing slash character, what percentage of the web would be parsed differently? Can you cite three independent examples of existing websites where the parsing would diverge? If it's only allowed on empty elements (now known as singleton elements in the spec) then this isn't about changing the handling, it's just about defining what is and is not conforming. Exactly. I do not think it's a good idea to make the trailing slash conforming. Although it is harmless, it provides no additional benefit at all and it creates the false impression that the syntax actually does something. The fact is that authors already try things like div/, p/ and even a/. I've seen all of those examples in the wild. See, for instance, the source of the XML 1.0 spec (and many others) which claim to be XHTML as text/html, littered with plenty of a/ tags all throughout. If these are common, and implemented interoperably, then what is the harm? An example of something that is NOT implemented interoperably is script src=.../. In my book, a document that states that it always is a parse error to do something despite abundant evidence to the contrary is not as useful as one that says here are the places where it works, and here are the places where it does not. I've even come across various authors either thinking that does work, or (when they find out the truth) wondering why it doesn't. It's not a good idea to confuse them any more by giving the impression that it works for some elements but not others. It's better to just say it doesn't work at all and forbid it in all cases. That's a slippery slope. At the extreme, it leads to XHTML 2.0, where features that are thought to be problematic are removed. Think of the children. By contrast, in HTML5, I see a document that attempts to be considerably less judgemental, and considerably more resilient. Inside the comments in the HTML 5 document I see statistics lovingly cited. Example: !-- As of 2005-12, studies showed that around 0.2% of pages used the image element. -- What percentage of pages use img/ constructs? and all this is coupled with Lachlan's observations[3] on what it would take to change the popular WordPress application to produce HTML5 compliant output. That just illustrates a fundamental flaw in the way WordPress has been built. It is a perfect example of a CMS built by a bunch of bozos [1] and cannot be used as an excuse for allowing the syntax. Be careful when you patronize. Is there really any excuse for allowing biOMG!/b/i? No, but HTML5 is willing to pinch its nose with thumb and forefinger and look the other way. It literally is not a battle worth fighting. As a side benefit of this change, I believe that I could modify my weblog to be simultaneously both HTML5 and XHTML5 compliant, modulo the embedded SVG content, something that would needs to be discussed separately. No you couldn't, and how would that be a benefit if you could? XHTML 5 requires xmlns, HTML 5 forbids it. HTML 5 requires !DOCTYPE html, XHTML 5 doesn't (though it's still well-formed, so you could get away with it). The last I saw, HTML 5 is a working draft. Did I miss a memo? With Venus, I translate all content into a canonical well formed XML format. This enables people who author filters to the ability to worry about a lot less random edge cases. I've already seen a lot of inventiveness when people find that they can apply off the shelf XML tools like XPath and XSLT. I'd gladly put in a !DOCTYPE html in my page, the question is: would the WHATWG be willing to meet me half way and allow xmlns attributes in a very select and carefully prescribed set of locations? By the way, my experience is that these types of conversations always start off bumpy not merely due to the well known limitation of email for conveying human emotion. The problem is deeper than that: there literally is no good place to start. The only way I know how to deal with that is to pose, and repeat, concrete and simple questions. And the one that I am posing with this thread is as follows: If HTML5 were changed so that these elements -- and these elements alone -- permitted an optional trailing slash character, what percentage of the web would be parsed differently? Can you cite three independent examples of existing websites where the parsing would diverge? [1] http://hsivonen.iki.fi/producing-xml/ - Sam Ruby
Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?
To me, '/' or '/' mean the tag's done. Therefore, 'select /.../select' (or anything similar) is just plain wrong -- that would be a select list with nothing in it, then some options that are hanging out somewhere on their own, then an unmatched closing select. This shouldn't validate, serializers shouldn't allow it, and deserializers should simply ignore the options and '/select' (or maybe dump the options' text to the output and just ignore the '/select'). Now this, 'img src=... /' -- which is what I thought this discussion was about initially -- is perfectly valid; it's nothing more than a tag without content. On 11/29/06, Stewart Brodie [EMAIL PROTECTED] wrote: Robert Sayre [EMAIL PROTECTED] wrote: On 11/29/06, Lachlan Hunt [EMAIL PROTECTED] wrote: I do not think it's a good idea to make the trailing slash conforming. Although it is harmless, it provides no additional benefit at all and it creates the false impression that the syntax actually does something. It does do something, in systems that think they are using XML (whether they actually are is another matter). It's possible it will prevent many information-free validation errors, and give the HTML5 more credibility as a result. Warning people about img / in the validator is a waste of their time. It's not a good idea to confuse them any more by giving the impression that it works for some elements but not others. It's better to just say it doesn't work at all and forbid it in all cases. Better? This is an opinion, and it's not backed up by data. So far, it looks like Sam has the data on his side. People do it, and it tends to work interoperably. Except when it doesn't. For example, here's a fragment of hotmail.com's signup page, served as text/html. It's the only example I've come across to date: !DOCTYPE html PUBLIC -//W3C//DTD XHTML 1.0 Strict//EN http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd; html xmlns=http://www.w3.org/1999/xhtml; dir=ltr ... select id=iRegion name=pff010004 / script.../script /select ... The script just document.write's loads of option tags (it's the country menu). It's hard to know what the author thought was going on. Did they think it was XHTML and just got stymied by the server configuration? I'm still in favour of permitting the trailing slash, personally. -- Stewart Brodie Software Engineer ANT Software Limited
Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?
On Wed, 29 Nov 2006 17:15:53 +0100, Sam Ruby [EMAIL PROTECTED] wrote: I do not think it's a good idea to make the trailing slash conforming. Although it is harmless, it provides no additional benefit at all and it creates the false impression that the syntax actually does something. The fact is that authors already try things like div/, p/ and even a/. I've seen all of those examples in the wild. See, for instance, the source of the XML 1.0 spec (and many others) which claim to be XHTML as text/html, littered with plenty of a/ tags all throughout. If these are common, and implemented interoperably, then what is the harm? An example of something that is NOT implemented interoperably is script src=.../. What do you mean with implemented interoperably? They are all treated as if they are just a start tag. (So they are actually treated identically to the script src=/ case, except for some versions of Safari and Opera and maybe Firefox which do what some people might expect for script src= / ...) -- Anne van Kesteren http://annevankesteren.nl/ http://www.opera.com/
Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?
On Wed, 29 Nov 2006 17:15:53 +0100, Sam Ruby [EMAIL PROTECTED] wrote: Is there really any excuse for allowing biOMG!/b/i? No, but HTML5 is willing to pinch its nose with thumb and forefinger and look the other way. It literally is not a battle worth fighting. Just like b / that causes a (perhaps several) parse error. Nothing special is done about it in HTML5. -- Anne van Kesteren http://annevankesteren.nl/ http://www.opera.com/
Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?
Le Wed, 29 Nov 2006 17:00:46 +0200, Robert Sayre [EMAIL PROTECTED] a écrit: On 11/29/06, Lachlan Hunt [EMAIL PROTECTED] wrote: I do not think it's a good idea to make the trailing slash conforming. Although it is harmless, it provides no additional benefit at all and it creates the false impression that the syntax actually does something. It does do something, in systems that think they are using XML (whether they actually are is another matter). It's possible it will prevent many information-free validation errors, and give the HTML5 more credibility as a result. Warning people about img / in the validator is a waste of their time. It's not a good idea to confuse them any more by giving the impression that it works for some elements but not others. It's better to just say it doesn't work at all and forbid it in all cases. Better? This is an opinion, and it's not backed up by data. So far, it looks like Sam has the data on his side. People do it, and it tends to work interoperably. I want to show support to Sam's proposal. I agree with him. I see HTML 5 as a specification that tries to be tailored to the current needs of the web developers, trying to cope with all the bad markup, tag soup on the web. It also defines complex algorithms for error recovering, everything supposedly leading one day to UAs with HTML 5 implementations, that will render all tag soup, and proper markup, the same (interoperability - somewhat utopic dream, nonetheless we must not give up). Of course, the algorithms described won't work as wanted with all tag soup, but the algorithms are trying to be all best-balanced, a compromise between the bad, the ugly and the good. All this leads me to say that Sam's proposal is a good one. One cannot expect that WordPress, all content management systems, all web developers, etc, will start working with pure HTML 5, or pure XHTML 5, in a single project, or even in a single page. XML parsers break if the code has no trailing slashes where needed, the majority of HTML parsers do not break if the author uses trailing slashes. Some web developers also make use, on the server, of XHTML and XML documents, which end up being sent to the UA - parts of, or entirely. Why trailing slashes need to break conformance? If trailing slashes are not accepted into HTML 5, then many other bad things should be banned. From the start, the error recovery should be eliminated, and treated as XML parsers do: stop on error, with no recovery. Web developers want to be able to share code between XHTML and HTML projects. The trailing slash issue should be inexistent. Today many sites use this trailing slash in HTML pages. Even if those pages do not validate today, I consider they should validate, as long as they validate without the trailing slashes. Take for example PHP which is used by many confused web developers. PHP provides the nl2br() function which searches for new lines and adds br /. Using that, they automatically invalidate their site. And that's only a very simple function. Very few web developers are not bozos [1]. :) [1] http://hsivonen.iki.fi/producing-xml/ -- http://www.robodesign.ro ROBO Design - We bring you the future
Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?
On 11/29/06, Anne van Kesteren [EMAIL PROTECTED] wrote: On Wed, 29 Nov 2006 17:10:10 +0100, Robert Sayre [EMAIL PROTECTED] wrote: Perhaps it would be better to prove that the current rules result in easy explanations. What would the text of a bug filed on WordPress look like? Let's assume you actually want them to fix it, not just make a point. The bug would request that Wordpress doesn't try to output XML for the text/html media type. That seems to be the problem here. Ok, so what would the text be? What problem would you tell them you were fixing? -- Robert Sayre
Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?
Anne van Kesteren wrote: On Wed, 29 Nov 2006 17:10:10 +0100, Robert Sayre [EMAIL PROTECTED] wrote: Perhaps it would be better to prove that the current rules result in easy explanations. What would the text of a bug filed on WordPress look like? Let's assume you actually want them to fix it, not just make a point. The bug would request that Wordpress doesn't try to output XML for the text/html media type. That seems to be the problem here. If the code for Wordpress fit on a page, that suggestion would be easy to implement. As it stands now, it appear that several hundred lines of code would need to change. And in each case, the code would need to be aware of the content type in effect. In some cases, that information may not be available. In fact, that may not have been determined yet. One way cross-cutting concerns such as this one are often handled is to simple capture the output and post-process it. Latchlan opted to do so with the WHATWG Blog. The first pass for things like this generally takes the form of simple pattern matching and regular expressions. Often this evolves. What would be better is something that could take that string and produce a DOM, from which a correct serialization can take place. Now, what type of parser would you use? HTML5's rules come tantalizingly close to handling this situation, except for a few cases involving tags that are self-closing... - Sam Ruby
Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?
On Wed, 29 Nov 2006 17:29:42 +0100, Robert Sayre [EMAIL PROTECTED] wrote: The bug would request that Wordpress doesn't try to output XML for the text/html media type. That seems to be the problem here. Ok, so what would the text be? What problem would you tell them you were fixing? I won't be fixing anything on Wordpress for the foreseeable future. Anyway, the bug report would point to http://www.hixie.ch/advocacy/xhtml and try to talk them into switching to HTML4 or something so they can easier switch to HTML5 later. -- Anne van Kesteren http://annevankesteren.nl/ http://www.opera.com/
Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?
On 11/29/06, Anne van Kesteren [EMAIL PROTECTED] wrote: On Wed, 29 Nov 2006 17:29:42 +0100, Robert Sayre [EMAIL PROTECTED] wrote: The bug would request that Wordpress doesn't try to output XML for the text/html media type. That seems to be the problem here. Ok, so what would the text be? What problem would you tell them you were fixing? I won't be fixing anything on Wordpress for the foreseeable future. Anyway, the bug report would point to http://www.hixie.ch/advocacy/xhtml and try to talk them into switching to HTML4 or something so they can easier switch to HTML5 later. Hmm, while that's in an interesting document, I'm not sure it presents a clear mental model for authors. Lachlan wrote that the current situation is clearer than Sam's proposal. So far, WHAT-WG members have failed to write a one or two paragraph bug report in clear English, with the target being the relatively advanced HTML authors working on WordPress. Can it be done? -- Robert Sayre I would have written a shorter letter, but I did not have the time.
Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?
On Wed, 29 Nov 2006 17:31:19 +0100, Mihai Sucan [EMAIL PROTECTED] wrote: XML parsers break if the code has no trailing slashes where needed, the majority of HTML parsers do not break if the author uses trailing slashes. Some web developers also make use, on the server, of XHTML and XML documents, which end up being sent to the UA - parts of, or entirely. Why trailing slashes need to break conformance? If trailing slashes are not accepted into HTML 5, then many other bad things should be banned. From the start, the error recovery should be eliminated, and treated as XML parsers do: stop on error, with no recovery. That doesn't make sense at all. As said before, parse error does not mean that parsing has to stop, it merely indicates that a syntax error has to be flagged somewhere. Parsers are allowed to stop processing at that point, but that doesn't make sense for any parser that tries to collect data. Only for parsers validating data. -- Anne van Kesteren http://annevankesteren.nl/ http://www.opera.com/
Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?
... The trailing slash issue should be inexistent. Today many sites use this trailing slash in HTML pages. Even if those pages do not validate today, I consider they should validate, as long as they validate without the trailing slashes. ... I don't think that page claiming to be authored as HTML4.01 should validate if it contains br /, etc. which, at least in theory, has entirely different meaning. Another point, maybe a bit off-topic: http://annevankesteren.nl/2005/11-kurafire ;) Regards, Rimantas -- http://rimantas.com/
Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?
On Wed, 29 Nov 2006, Robert Sayre wrote: So far, WHAT-WG members have failed to write a one or two paragraph bug report in clear English, with the target being the relatively advanced HTML authors working on WordPress. Can it be done? Please use HTML4 instead of XHTML1 in the output from WordPress blogs. The browser used by the majority of my readers doesn't support XHTML1, and is having to rely on error handling to handle your output. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?
Anne van Kesteren wrote: What do you mean with implemented interoperably? produce the same DOM - Sam Ruby
Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?
Mihai Sucan wrote: Web developers want to be able to share code between XHTML and HTML projects. Yes, some web developers want to do stupid things. If you want to share data between HTML and XHTML, then do it properly. Parse it in one form and re-serialise it in the other. Don't just use string processing to do silly things like this: xhtml = p + html + /p -- Lachlan Hunt http://lachy.id.au/
[whatwg] Inferring rel=feed from the media type
Hi. HTML 5 says; If the alternate keyword is used with the type attribute set to the value application/rss+xml or the value application/atom+xml, then the user agent must treat the link as it would if it had the feed keyword specified as well. -- http://www.whatwg.org/specs/web-apps/current-work/#link-type I believe this in error. Atom, at least (I expect this also holds for RSS), is useful for representing more things than just feeds. It's really a generic packaging mechanism. For example, one might do the equivalent of MHTML[1] using Atom, and link such a document to an HTML page with rel=alternate. But it isn't a feed, and it isn't something you'd want syndication tools to auto-discover as a feed, since that will just confuse users. In addition, the media type on link is non-authoritative, meaning that feed-semantics would be inferred before it was even ascertained that the would-be representation was actually an Atom or RSS document. Thanks. [1] http://www.ietf.org/rfc/rfc2557.txt Mark.
Re: [whatwg] Inferring rel=feed from the media type
On Wed, 29 Nov 2006, Mark Baker wrote: HTML 5 says; If the alternate keyword is used with the type attribute set to the value application/rss+xml or the value application/atom+xml, then the user agent must treat the link as it would if it had the feed keyword specified as well. -- http://www.whatwg.org/specs/web-apps/current-work/#link-type I believe this in error. It is intentional, as a way of grandfathering widespread legacy practice. I agree that it is suboptimal. I'm not sure how to cater to both the existing content and, moving forward, to allow Atom to be used with rel=alternate to mean alternate representation that isn't a feed. But it isn't a feed, and it isn't something you'd want syndication tools to auto-discover as a feed, since that will just confuse users. Putting a real feed first would get around this, but you're right that in the case you described (and assuming no feed), there'd not really be a way to get around this other than simply not including the type= attribute. In addition, the media type on link is non-authoritative, meaning that feed-semantics would be inferred before it was even ascertained that the would-be representation was actually an Atom or RSS document. Yeah. I think the spec is clear that the real MIME type overrides it once the file has been fetched; but again, existing practice constrains what we can do here. In conclusion, I'm not sure we can do anything here. We're stuck between a rock and a hard place, as it were. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Inferring rel=feed from the media type
On Nov 29, 2006, at 19:59, Ian Hickson wrote: I'm not sure how to cater to both the existing content and, moving forward, to allow Atom to be used with rel=alternate to mean alternate representation that isn't a feed. http://www.intertwingly.net/wiki/pie/PaceEntryMediatype If it passes, of course. -- Henri Sivonen [EMAIL PROTECTED] http://hsivonen.iki.fi/
Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?
On 11/29/06, Robert Sayre [EMAIL PROTECTED] wrote: Ok, I have submitted a bug report. http://trac.wordpress.org/ticket/3406 Let's see what happens. Well, that didn't seem too effective. :/ -- Robert Sayre
Re: [whatwg] Inferring rel=feed from the media type
Hi Ian, On 11/29/06, Ian Hickson [EMAIL PROTECTED] wrote: On Wed, 29 Nov 2006, Mark Baker wrote: HTML 5 says; If the alternate keyword is used with the type attribute set to the value application/rss+xml or the value application/atom+xml, then the user agent must treat the link as it would if it had the feed keyword specified as well. -- http://www.whatwg.org/specs/web-apps/current-work/#link-type I believe this in error. It is intentional, as a way of grandfathering widespread legacy practice. I agree that it is suboptimal. I'm not sure how to cater to both the existing content and, moving forward, to allow Atom to be used with rel=alternate to mean alternate representation that isn't a feed. What about documenting that some agents make that assumption, but not prescribing that all agents must do so? And to answer your other question, the proposed new media type for Atom entry documents would only solve the problem for entries. It wouldn't solve them for the MHTML-like Atom document I described, nor any other non-feed use of Atom... of which there most likely will be many in the future. If such a solution were used as precedent for solving the problem for those uses of Atom, it would mean a new media type for each use; a media type per link type, in fact. Ouch! So no, I'm not a fan 8-) Mark.
Re: [whatwg] Inferring rel=feed from the media type
On 11/29/06, Ian Hickson [EMAIL PROTECTED] wrote: On Wed, 29 Nov 2006, Mark Baker wrote: What about documenting that some agents make that assumption, but not prescribing that all agents must do so? The idea underlying the work here is to foster interoperability, not document the lack of interoperability, so that isn't really an option. When you're documenting age-old practice which is in widespread use, I fully agree. Feed autodiscovery is effectively brand new and not widespread at all when compared to how widespread it should become in 20 years. I think there's still lots of time to fix it. Mark.
Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?
Sorry for being the dunce here, but is anybody saying otherwise? Whereas XML _requires_ that you close every tag, HTML5 _should allow_ you to close any tag. I agree with what was said previously about considering something like 'select //select' invalid, but if somebody's suggesting that something like 'img src=... /' or 'br /' should also be invalid, I disagree. Validators and UAs should accept singleton tags _with or without_ the self-closer. Am I totally misunderstanding or missing the point here? On 11/29/06, Leons Petrazickis [EMAIL PROTECTED] wrote: On 11/29/06, Robert Sayre [EMAIL PROTECTED] wrote: On 11/29/06, Robert Sayre [EMAIL PROTECTED] wrote: Ok, I have submitted a bug report. http://trac.wordpress.org/ticket/3406 Let's see what happens. Well, that didn't seem too effective. :/ This rigmarole is going to repeat on every site that has converted to XHTML sent as text/html. People are emotionally invested in the idea of trailing slashes. Websites have complex codebases, and going through them removing trailing slashes on singleton elements would be very hard. They've already reaped all the benefits of XHTML -- cleaner, more readable, more maintainable code. There's no incentive for them to agree with you. This is a minor point that we need to give to them. The very idea of HTML5 is to not demand that the Web be scrapped and rewritten. We need the people who have rewritten all their pages so that they validate on the W3C validator -- they have the fire and the zeal and the will to spread our format. We need to make the migration from invalid XHTML to valid HTML5 very, very easy for them. We can't require them to dig through PHP spaghetti. And that means that, no matter how it's achieved, br/ needs to be valid HTML5. -- Leons Petrazickis
Re: [whatwg] Inferring rel=feed from the media type
On Wed, 29 Nov 2006, Mark Baker wrote: When you're documenting age-old practice which is in widespread use, I fully agree. Feed autodiscovery is effectively brand new and not widespread at all when compared to how widespread it should become in 20 years. I think there's still lots of time to fix it. I'm not sure what you're basing your assertion on; based on my own research of several billion documents, feed autodiscovery is used on hundreds of millions of pages, far beyond the point of no return in terms of backwards-compatibility constraints. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?
Lachlan Hunt schrieb: ... The fact is that authors already try things like div/, p/ and even a/. I've seen all of those examples in the wild. See, for instance, the source of the XML 1.0 spec (and many others) which claim to be XHTML as text/html, littered with plenty of a/ tags all throughout. ... Huh? The thing at http://www.w3.org/TR/REC-xml/? Don't see that problem there. If this was the case at an earlier point of time, it was probably caused by a bug in their XSLT code, not the authors writing the spec (which IMHO uses the W3C's xmlspec XML language). Best regards, Julian
Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?
Thanks Ian - so is it fair to say that self-closing singletons should be _allowed_ but not _required_ -- that either syntax would be accepted as valid HTML5? That only makes sense to me -- it's backward-compatible while allowing XHTML compatibility as well. Your point about 'p /test' being the same as 'ptest/p' is very interesting. That's not something I've ever done (that I'm aware of, anyway), and it surprises me that it works that way. As a divergent example -- at least in IE6 -- 'div /' is treated as an inline element rather than a block...that's probably non-standard behavior, and in any case it was a surprise when I encountered it. In case you can't tell, I haven't made it through the whole proposed spec yet, so apologies if my questions and observations are springing from ignorance. On 11/29/06, Ian Hickson [EMAIL PROTECTED] wrote: The argument is that the self-closer / is an XMLism, and that HTML5 has nothing to do with XML, so there's no reason for it to apply here. Note that in HTML, this: p/ test ...regardless of what this discussion results in, will always be treated exactly the same as: p test /p ...because, for legacy reasons, there's no way we can treat / as a self-closer in any tag other than void tags (like img or br). -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?
Anne van Kesteren schrieb: On Wed, 29 Nov 2006 18:03:33 +0100, Julian Reschke [EMAIL PROTECTED] wrote: The fact is that authors already try things like div/, p/ and even a/. I've seen all of those examples in the wild. See, for instance, the source of the XML 1.0 spec (and many others) which claim to be XHTML as text/html, littered with plenty of a/ tags all throughout. ... Huh? The thing at http://www.w3.org/TR/REC-xml/? Don't see that problem there. h5a name=IDANQDS id=IDANQDS /Names and Tokens/h5 is one example... If this was the case at an earlier point of time, it was probably caused by a bug in their XSLT code, not the authors writing the spec (which IMHO uses the W3C's xmlspec XML language). In your humble opinion or is it just a fact? :-) Aha. I thought it was about an a / with no attributes. So yes, that's a bug in the XSLT code (xmlspec.xsl). I'll forward this info to Norman Walsh. Best regards, Julian
Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?
Ian Hickson wrote: On Wed, 29 Nov 2006, Leons Petrazickis wrote: This rigmarole is going to repeat on every site that has converted to XHTML sent as text/html. People are emotionally invested in the idea of trailing slashes. Websites have complex codebases, and going through them removing trailing slashes on singleton elements would be very hard. If people want to make HTML5 syntactically compatible with XHTML1, such that XHTML1 documents don't cause syntax errors in HTML5, we'll have to do a whole lot more than just allowing trailing /s. I don't really see why that would be a goal, though. Going further, if we want to make documents in general compliant with HTML5, then we've got our work cut out for us -- at least 78% of documents are syntactically incorrect today (not counting things like trailing /s in attributes, or missing DOCTYPEs -- if you include those, the number is more like 93%). I tentatively support the idea that trailing slashes on singleton[1] elements should not be a parse error. I don't think it has any actual technical merit but I think it will be helpful in getting developer mindshare; a lot of people have drunk the Zeldman Koolaid and have the ideas of XHTML, clean markup, CSS, and conformance to standards in general all mushed together in their brain[2]. For these people (who I think represent the upper quartile of web developers in terms of commitment to good markup) the trailing slash in empty elements is the syntax of a new generation - it is a symbol that represents everything that has changed in web design since 1996 - as intrinsically useless as a fashionable designer label but just as seductive. [1] I find that name quite confusing as it suggests there should only be one in the entire document. [2] c.f. the code is poetry comment in the Wordpress bug report despite the fact that most here would argue HTML 4 as text/html is considerably more poetic than XHTML as text/html. -- The universe doesn't care what you believe. The wonderful thing about science is that it doesn't ask for your faith, it just asks for your eyes --- http://xkcd.com/c154.html
Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?
On 11/29/06, Robert Sayre [EMAIL PROTECTED] wrote: On 11/29/06, Robert Sayre [EMAIL PROTECTED] wrote: Ok, I have submitted a bug report. http://trac.wordpress.org/ticket/3406 Let's see what happens. Well, that didn't seem too effective. :/ Ah, if you visit now, you'll find a WHAT-WG member has written fundamental flaw with the way WordPress has been built in bright red letters. Not exactly Dale Carnegie material. It still seems impossible to file a bug on teXtHTML. Sam Ruby wrote: Drawing lines in the sand and maintaining that br / is invalid is only going to make more busy work for a lot of people. If you try to explain why this decision was made, most won't understand, and eventually most will decide that compliance isn't worth the bother. Agree. -- Robert Sayre
Re: [whatwg] Inferring rel=feed from the media type
On Wed, 29 Nov 2006, Mark Baker wrote: On 11/29/06, Ian Hickson [EMAIL PROTECTED] wrote: On Wed, 29 Nov 2006, Mark Baker wrote: When you're documenting age-old practice which is in widespread use, I fully agree. Feed autodiscovery is effectively brand new and not widespread at all when compared to how widespread it should become in 20 years. I think there's still lots of time to fix it. I'm not sure what you're basing your assertion on; based on my own research of several billion documents, feed autodiscovery is used on hundreds of millions of pages, far beyond the point of no return in terms of backwards-compatibility constraints. I wouldn't call that a very good metric for the purposes of this discussion though, because I expect that the bulk of those pages are produced by a handful of blog hosting services. If we can shrink 100s of millions by 4 or 5 (or more) orders of magnitude with a handful of persuasively written emails, then the situation is not what I would call widespread. It's widespread _today_, such that UAs today can't change their behaviour. Thus we can't change the spec today. If you reduced the volume of such usage, then it would be worth revisiting, but unless that happens, we're merely talking hypotheticals. Personally I wouldn't be optimistic about the ability to change the legacy data; historically it has not been possible. I don't really know of any successful attempt, to the point where browsers historically have even tried using different processing modes -- the whole quirks mode thing -- to get around legacy content incompatible with the specifications. Are you able to analyze what proportion of those pages are hosted by the top, say, 10 hosters? Not from my current data set, no. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Inferring rel=feed from the media type
On Wed, 29 Nov 2006, James M Snell wrote: Is HTML5 intended to be a description of the Way Things Are or a description of the Way Things Ought To Be? It's a description of what browsers should implement if they want to be compatible with legacy content while supporting new features. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Rel alternate, stylesheet and feed
Ian Hickson wrote: On Wed, 29 Nov 2006, Lachlan Hunt wrote: The spec defines special handling for rel=alternate stylesheet, but also defines that alternate with type=application/atom+xml or type=application/rss+xml implies the feed relationship. Does this represent an alternate stylesheet, a syndication feed or both? If alternate and stylesheet are both specified, the alternate keyword doesn't imply feed, because the rest of this subsection doesn't apply if it says both. So feed isn't implied. link rel=alternate stylesheet type=application/atom+xml href=/feed title=Blog Entries Firefox 2 [...] recognised it as a feed. Thanks, bug filed. https://bugzilla.mozilla.org/show_bug.cgi?id=362329 -- Lachlan Hunt http://lachy.id.au/
Re: [whatwg] Element content models
Anne van Kesteren [EMAIL PROTECTED], 2006-11-26 12:58 +0100: Some element content model explicitly mention that they can't contain themself. This probably makes sense for the following elements as well: * meter * progress * time * t * m * abbr? * cite? There might be more. annotations (footnotes, endnotes, marginalia) and acronym --Mike