Re: [whatwg] several messages about XML syntax and HTML5

Michel Fortin Mon, 04 Dec 2006 08:11:25 -0800

Le 4 déc. 2006 à 2:55, Ian Hickson a écrit :

I've been having a lot of trouble following this discussion, because I
can't work out what it is that is being asked for. There seem to be
multiple discussions going on, and it isn't clear to me that everybody
really knows what they are arguing for or against.


This discussion is pretty confusing indeed.

I've changed the spec to allow a (meaningless) "xmlns" attribute ontheroot <html> element, for the same reasons /> is allowed on voidelementsnow. I don't think it's a particularly useful thing, but I'mcurious tosee what people think. (Like anything in the spec, we might removeit in
due course, based on real world experiences with the spec.)


I think that'll be useful.

Well, SVG itself would arguably be bad because it is poor from a
semantic standpoint.


HTML is poor from a semantic standpoint.


HTML is actually pretty rich, all things considered. SVG, on the other
hand, is media-specific and presentational.

<div> and <span> are poor from a semantic standpoint. They're stilluseful for a variety of reasons and I see no one arguing they shouldbe excluded. I'm not saying SVG should or should not be added toHTML, but I'm pretty sure inline SVG is useful too.

On Sat, 2 Dec 2006, Mike Schinkel wrote:

But please take into consideration that almost nobody writes webpages

using a DOM; they write web pages using text editors and dynamically
using string concatonation. As such there is great value for users in
having them be as similar as possible. If they converge, it will
accelerate chaos on the web.


With the addition of xmlns="" (see above), they are now as close as
possible, I believe.

That's probably all that can be done on the HTML side. But wouldsomething bad happen if you were to make html:lang valid within XHTML?

On Sat, 2 Dec 2006, Elliotte Harold wrote:
What I don't understand is why some members of this working groupis sodead set on actively preventing HTML from being XML. The non-draconianerror handling I understand. But why are you disappointed that <!DOCTYPEhtml> is well-formed XML? Why the active hostility to well-formedness?

I was initially disappointed that <!DOCTYPE html> is well-formedbecause I though that it'd allow to differentiate HTML from XHTMLdocuments unambiguously (since XHTML documents couldn't have it).That said, now I think it's probably irrelevant.

The two format are not the same, but many people have been trying tofind common ground since XHTML has been invented for various reasons.The result is a lot of HTML documents which are wrongly identified asXHTML (because they're not even well-formed XML). So I think droppingthe HTML/XHTML identification string altogether is the right thing todo; it's meaningless anyway because a lot of authors are careless.Let's use the media type instead, the real thing browsers use todifferentiate the two, and force people to make things well formed ifthey want it called XHTML by the validator.

What I'm "hostile" towards is the fiction that you can take an XMLparserand attempt to parse an HTML document. The two formats aren't thesame,
using the wrong parser is simply that, wrong.

I don't think many people really think this. I think those who saythat say that because they've been using some subset of HTML which iscompatible with XML for their own documents, therefore *their* HTMLdocuments can be sent through an XML parser. But I'm pretty surepeople on this list realise that this doesn't apply to the general case.

http://wiki.whatwg.org/wiki/HTML_vs._XHTML#Differences_Between_HTML_and_XHTML


Nice resource. That could be prove very handy.

The other half could be addressed by one little box in the corner of
Firefox's status bar that's a smiley face if the page is valid, and a
frown if it isn't.

A browser that shipped with a frowy face showing on 93% of pageswould do

very badly in usability studies (and thus very badly in the market).

I just want to point out in case someone is interested that there isactually a browser like this for the Mac: iCab [1].


 [1]: http://icab.de/

In the Web Apps 1.0 world, an HTTP message whose headers say text/html is
an HTML document, regardless of what sequence of bytes the body of the
message actually say. An HTTP message whose headers say text/xml,or use
some other XML MIME type, is an XML document. It's the MIME type that
decides how it is processed. If it is processed as an HTMLdocument, then
it _is_ an HTML document, possibly with errors. So says the spec.


I just want to say I like very much this definition.

On Sat, 2 Dec 2006, Michel Fortin wrote:
Having two markups pose the same problem as having twoincompatible HD
DVD formats. Browsers do (or will) accept both formats, so as long as
the media type is known it'll work fine for them. But what abouteveryother piece of software in the middle that does not talk directlyto the
browser?
That's the real difficulty when dealing with HTML and XHTML: thechoice
isn't really about tools, it's a choice between two incompatible
exchange format. That's the reason why I think it's compelling tohave acommon subset between HTML and XHTML. If you can output somethingvalidfor both HTML and XHTML at the same time, then you don't have toworry
about what format is supported on the other end.
The problem is that the common subset would be just that -- asubset. The
common subset of HTML and XHTML has very few useful features!

I don't see that as a problem. But before arguing the subset is orisn't too tiny to be useful, shouldn't we care to define what thesubset actually is?

The only features of HTML I see that are not supported by the subsetare <base> vs. xml:base, and that you can't specify encoding withinthe file because one use <?xml encoding=""?> and the other use <metahttp-equiv="">, but the encoding can still be set as a media typeparameter). Setting the language is not in the valid part of thesubset, but if you don't care about validity on the XHTML side youcould just use html:lang so I'll put it in the functional part of thesubset.

That doesn't leave much of HTML that can't be expressed by thesubset. Am I missing something? Which useful features aren't part ofthe subset?

There are of course some differences in CSS and in scripting too, butthat's nothing that can't be worked around.

On Sat, 2 Dec 2006, Elliotte Harold wrote:

James Graham wrote:

Well I think you're hugely mistaken. Any model without support for
error recovery is not suitable for hand authoring (and onlymarginally
suitable for machine authoring).


You mean like almost every programming language ever invented? When's
the last time you saw error recovery in a C compiler?

JavaScript is a more apt example, since it's used on the Web... andit has

error recovery all over the place.

I think Elliotte's point is nonetheless valid. There are languagespretty lax about the syntax, there are languages pretty strict aboutthe syntax; people can hand-write both kinds. The only importantthing is that they check that it works (by compiling, or by testingthe page in the browser with the same parser they intend to serve it)and that they be able to figure out what's wrong and fix it.

On Sun, 3 Dec 2006, Elliotte Harold wrote:
WordPress allows angle brackets. However I almost never use them.Instead I
use its markdown format. Most other users do the same, I think. [...]
I suspect the others you mention are similar. I don't everremember using
angle brackets on Blogger, but it's been a while.
It would be better to have hard data to work with, rather thanhaving torely on our opinions of this. My own research does not suggest thatmostauthors use tools. That over three quarters of pages have majorsyntacticerrors leads me to suspect that tools are not going to save thesyntax.

I concur with Ian here. Leaving comments on blogs and elsewhere oftenrequire me to add link as HTML. That doesn't mean that the blogsoftware won't fix any incorrect markup I've sent however.

I'd add that even if it was true that hand authoring is a distinctminority, do you have an idea how much often people want to bypasscustom syntaxes and write raw HTML? I'd say pretty often. There's areason why Textile, Markdown and some others lightweight markupsyntax (as called on Wikipedia) have means to do so.




Michel Fortin
[EMAIL PROTECTED]
http://www.michelf.com/

Re: [whatwg] several messages about XML syntax and HTML5

Reply via email to