Le 4 déc. 2006 à 2:55, Ian Hickson a écrit :

I've been having a lot of trouble following this discussion, because I
can't work out what it is that is being asked for. There seem to be
multiple discussions going on, and it isn't clear to me that everybody
really knows what they are arguing for or against.

This discussion is pretty confusing indeed.


I've changed the spec to allow a (meaningless) "xmlns" attribute on the root <html> element, for the same reasons /> is allowed on void elements now. I don't think it's a particularly useful thing, but I'm curious to see what people think. (Like anything in the spec, we might remove it in
due course, based on real world experiences with the spec.)

I think that'll be useful.


Well, SVG itself would arguably be bad because it is poor from a
semantic standpoint.

HTML is poor from a semantic standpoint.

HTML is actually pretty rich, all things considered. SVG, on the other
hand, is media-specific and presentational.

<div> and <span> are poor from a semantic standpoint. They're still useful for a variety of reasons and I see no one arguing they should be excluded. I'm not saying SVG should or should not be added to HTML, but I'm pretty sure inline SVG is useful too.


On Sat, 2 Dec 2006, Mike Schinkel wrote:

But please take into consideration that almost nobody writes web pages
using a DOM; they write web pages using text editors and dynamically
using string concatonation. As such there is great value for users in
having them be as similar as possible. If they converge, it will
accelerate chaos on the web.

With the addition of xmlns="" (see above), they are now as close as
possible, I believe.

That's probably all that can be done on the HTML side. But would something bad happen if you were to make html:lang valid within XHTML?


On Sat, 2 Dec 2006, Elliotte Harold wrote:

What I don't understand is why some members of this working group is so dead set on actively preventing HTML from being XML. The non- draconian error handling I understand. But why are you disappointed that <! DOCTYPE html> is well-formed XML? Why the active hostility to well- formedness?

I was initially disappointed that <!DOCTYPE html> is well-formed because I though that it'd allow to differentiate HTML from XHTML documents unambiguously (since XHTML documents couldn't have it). That said, now I think it's probably irrelevant.

The two format are not the same, but many people have been trying to find common ground since XHTML has been invented for various reasons. The result is a lot of HTML documents which are wrongly identified as XHTML (because they're not even well-formed XML). So I think dropping the HTML/XHTML identification string altogether is the right thing to do; it's meaningless anyway because a lot of authors are careless. Let's use the media type instead, the real thing browsers use to differentiate the two, and force people to make things well formed if they want it called XHTML by the validator.


What I'm "hostile" towards is the fiction that you can take an XML parser and attempt to parse an HTML document. The two formats aren't the same,
using the wrong parser is simply that, wrong.

I don't think many people really think this. I think those who say that say that because they've been using some subset of HTML which is compatible with XML for their own documents, therefore *their* HTML documents can be sent through an XML parser. But I'm pretty sure people on this list realise that this doesn't apply to the general case.

http://wiki.whatwg.org/wiki/ HTML_vs._XHTML#Differences_Between_HTML_and_XHTML

Nice resource. That could be prove very handy.


The other half could be addressed by one little box in the corner of
Firefox's status bar that's a smiley face if the page is valid, and a
frown if it isn't.

A browser that shipped with a frowy face showing on 93% of pages would do
very badly in usability studies (and thus very badly in the market).

I just want to point out in case someone is interested that there is actually a browser like this for the Mac: iCab [1].

 [1]: http://icab.de/


In the Web Apps 1.0 world, an HTTP message whose headers say text/ html is
an HTML document, regardless of what sequence of bytes the body of the
message actually say. An HTTP message whose headers say text/xml, or use
some other XML MIME type, is an XML document. It's the MIME type that
decides how it is processed. If it is processed as an HTML document, then
it _is_ an HTML document, possibly with errors. So says the spec.

I just want to say I like very much this definition.


On Sat, 2 Dec 2006, Michel Fortin wrote:

Having two markups pose the same problem as having two incompatible HD
DVD formats. Browsers do (or will) accept both formats, so as long as
the media type is known it'll work fine for them. But what about every other piece of software in the middle that does not talk directly to the
browser?

That's the real difficulty when dealing with HTML and XHTML: the choice
isn't really about tools, it's a choice between two incompatible
exchange format. That's the reason why I think it's compelling to have a common subset between HTML and XHTML. If you can output something valid for both HTML and XHTML at the same time, then you don't have to worry
about what format is supported on the other end.

The problem is that the common subset would be just that -- a subset. The
common subset of HTML and XHTML has very few useful features!

I don't see that as a problem. But before arguing the subset is or isn't too tiny to be useful, shouldn't we care to define what the subset actually is?

The only features of HTML I see that are not supported by the subset are <base> vs. xml:base, and that you can't specify encoding within the file because one use <?xml encoding=""?> and the other use <meta http-equiv="">, but the encoding can still be set as a media type parameter). Setting the language is not in the valid part of the subset, but if you don't care about validity on the XHTML side you could just use html:lang so I'll put it in the functional part of the subset.

That doesn't leave much of HTML that can't be expressed by the subset. Am I missing something? Which useful features aren't part of the subset?

There are of course some differences in CSS and in scripting too, but that's nothing that can't be worked around.


On Sat, 2 Dec 2006, Elliotte Harold wrote:

James Graham wrote:

Well I think you're hugely mistaken. Any model without support for
error recovery is not suitable for hand authoring (and only marginally
suitable for machine authoring).

You mean like almost every programming language ever invented? When's
the last time you saw error recovery in a C compiler?

JavaScript is a more apt example, since it's used on the Web... and it has
error recovery all over the place.

I think Elliotte's point is nonetheless valid. There are languages pretty lax about the syntax, there are languages pretty strict about the syntax; people can hand-write both kinds. The only important thing is that they check that it works (by compiling, or by testing the page in the browser with the same parser they intend to serve it) and that they be able to figure out what's wrong and fix it.


On Sun, 3 Dec 2006, Elliotte Harold wrote:

WordPress allows angle brackets. However I almost never use them. Instead I
use its markdown format. Most other users do the same, I think. [...]

I suspect the others you mention are similar. I don't ever remember using
angle brackets on Blogger, but it's been a while.

It would be better to have hard data to work with, rather than having to rely on our opinions of this. My own research does not suggest that most authors use tools. That over three quarters of pages have major syntactic errors leads me to suspect that tools are not going to save the syntax.

I concur with Ian here. Leaving comments on blogs and elsewhere often require me to add link as HTML. That doesn't mean that the blog software won't fix any incorrect markup I've sent however.

I'd add that even if it was true that hand authoring is a distinct minority, do you have an idea how much often people want to bypass custom syntaxes and write raw HTML? I'd say pretty often. There's a reason why Textile, Markdown and some others lightweight markup syntax (as called on Wikipedia) have means to do so.



Michel Fortin
[EMAIL PROTECTED]
http://www.michelf.com/


Reply via email to