On 12/2/06, Henri Sivonen <[EMAIL PROTECTED]> wrote:
On Dec 2, 2006, at 18:24, Sam Ruby wrote:

> It would not be wise for HTML5 to limit itself to the more constrained
> character set of XML.  In particular, the form feed character is
> pretty popular,

BTW, I copy and pasted the wrong table.  The characters I mentioned
were discouraged (and include such things as Microsoft smart quotes
mislabeled as iso-8859-1).  The actual allowed set in XML 1.0 is as
follows:

#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

For XML 1.1 the list is as follows:

[#x1-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

> This is yet another case where "take HTML5, read it into a DOM, and
> serialize it as XML, and voilĂ : you have valid XHTML" doesn't work.

What I am advocating is making sure that *conforming* HTML5 documents
can be serialized as XHTML5 without dataloss.

Then you will also need to disallow newlines in attribute values.

In any case, I understand the desire; my read is that the WG's desire
for backwards compatibility is higher.  Limiting the character set to
the allowable XML 1.1 character set should not be a problem for
backwards compatibility purposes.

- Sam Ruby

Reply via email to