On 12/2/06, Henri Sivonen <[EMAIL PROTECTED]> wrote:
On Dec 2, 2006, at 18:24, Sam Ruby wrote:
> It would not be wise for HTML5 to limit itself to the more constrained
> character set of XML. In particular, the form feed character is
> pretty popular,
BTW, I copy and pasted the wrong table. The characters I mentioned
were discouraged (and include such things as Microsoft smart quotes
mislabeled as iso-8859-1). The actual allowed set in XML 1.0 is as
follows:
#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
For XML 1.1 the list is as follows:
[#x1-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
> This is yet another case where "take HTML5, read it into a DOM, and
> serialize it as XML, and voilĂ : you have valid XHTML" doesn't work.
What I am advocating is making sure that *conforming* HTML5 documents
can be serialized as XHTML5 without dataloss.
Then you will also need to disallow newlines in attribute values.
In any case, I understand the desire; my read is that the WG's desire
for backwards compatibility is higher. Limiting the character set to
the allowable XML 1.1 character set should not be a problem for
backwards compatibility purposes.
- Sam Ruby