On Dec 2, 2006, at 14:02, Elliotte Harold wrote:
Lachlan Hunt wrote:
HTML and XML have significantly different parsing requirements and
they absolutely must be treated as significantly different file
formats. Any attempt to treat them as the same format is an
extremely bad idea.
That's only true to the extent that some people seem to insist on
making them needlessly different. HTML is tantalizingly close to
well-formed XML. They both derive from SGML. They both use angle
bracketed tags. They both define a tree structure. Indeed in many
cases an HTML document is an XML document.
But the point is that the text/html processing model has to work with
the real Web where not all documents are well-formed.
This enables the use of the very powerful XML toolchain for
processing HTML.
You can use the toolchain, except for the XML processor itself, as I
have explained before.
What I don't understand is why some members of this working group
is so dead set on actively preventing HTML from being XML. The non-
draconian error handling I understand. But why are you disappointed
that <!DOCTYPE html> is well-formed XML? Why the active hostility
to well-formedness?
To make a conformance checker not accidentally let MIME type mistakes
silently pass in some cases.
--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/