William F Hammond wrote:
In the spec at 8.1.2.1 (6) (for the text/html serialization):
You seem to refer to a clause in the W3C draft,
http://dev.w3.org/html5/spec/Overview.html#start-tags
and not in the WHATWG draft http://whatwg.org/html5 (which has different
numbering). It would be nice to know in advance which draft is referred to,
especially since both of them fairly often freeze my browsers.
Then, if the element is one of the void elements, or if the
element is a foreign element, then there may be a single U+002F
SOLIDUS character (/). This character has no effect on void
elements, but on foreign elements it marks the start tag as
self-closing.
That may look like unnecessarily complex, but there's a point in the
complexity. For "void elements" (elements with EMPTY declared content in the
SGML world), syntax like <br /> or even <br/> has become common when people
have tried to be modern and use XHTML, even when their documents mostly use
old HTML syntax. For "foreign elements", i.e. for XML fragments from outside
the HTML space, we must of course play by XML rules. For other elements,
it's best to assume that the "/" got there by accident, and ignore it, as
browsers currently do for HTML documents.
It would be better to allow self-closing tags on all de facto empty
elements, foreign or not and defined-empty or not.
I don't quite understand the phrase "de facto empty elements". If you treat
"/" as making a tag "self-closing" (i.e. a closing the _element_ by acting
as both start and end tag), then you of course make the element's content
empty. So what's the point of the words "de facto"?
This is better because (1) authors are given more choice and (2) DOM
building is simplified.
Item (1) is a counter-argument, because we don't need any more choices for
authors in the already confusing situation. Considering that HTML 5 will be
an incomplete draft with only partial implementations for many, many years,
there will be misunderstandings and hearsay-based authoring, so that people
use different syntaxes without knowing what they are doing. Browsers do not
actually treat <p /> as <p></p>, so why would you give authors the
impression that they do?
Item (2) isn't relevant because DOM building would not be essentially
simplified, and because any simplification there would be at most a minor
convenience to people who write browsers. And this isn't really about DOM
building but about parsing.
For example, while it is true that major browsers seem to treat "<p/>"
as an open tag, the relevant question for backward comptatibility is
whether anyone has been relying on the idea that "<p/>" can be used to
begin a non-empty paragraph.
It would be odd to intentionally rely on that, but if a document
accidentally contains, say, <html /> at the start, should the page really be
displayed as empty?
But there's a stronger reason too, related to the fact that people fairly
often write like
<a href=/foobar.html>
relying on corrective processing of the attribute value implying quotes,
<a href="/foobar.html">
rather than any particular treatment of the slash. Such sloppy syntax of
attribute denotations is fairly common and usually causes no problems,
unless the author starts validating the page, making him wildly confused
(see the Saga of the slashed validators,
http://www.cs.tut.fi/~jkorpela/qattr.html ). It's common, and the HTML 5
draft appears to "legalize" it.
And you can link to the root of a server using <a href=/>. This may be bad
style, but it hasn't really harmed anyone. Making that tag "self-closing",
i.e. equivalent to <a href=></a>, would not be nice.
--
Yucca, http://www.cs.tut.fi/~jkorpela/