On 12/1/06, Elliotte Harold <[EMAIL PROTECTED]> wrote:
Henri Sivonen wrote:

>> 6. Are noncharacters U+FDD0..U+FDEF allowed (?)
>> 7. Are the noncharacters from the last two characters of each plane
>> allowed (?)
>
> I don't have particularly strong feelings here. Putting those characters
> is HTML is a bad idea, but allowing them is not a problem for HTML5 to
> XHTML5 conversion and they aren't a common problem like C1 controls.

FFFE and FFFF are specifically forbidden by XML so they should probably
be forbidden here too. I think the others are allowed.

Unicode (not XML) reserves U+D800 – U+DFFF as well as U+FFFE and U+FFFF.

XML 1.0 only allows the following characters:

[#x7F-#x84], [#x86-#x9F], [#xFDD0-#xFDDF],
[#x1FFFE-#x1FFFF], [#x2FFFE-#x2FFFF], [#x3FFFE-#x3FFFF],
[#x4FFFE-#x4FFFF], [#x5FFFE-#x5FFFF], [#x6FFFE-#x6FFFF],
[#x7FFFE-#x7FFFF], [#x8FFFE-#x8FFFF], [#x9FFFE-#x9FFFF],
[#xAFFFE-#xAFFFF], [#xBFFFE-#xBFFFF], [#xCFFFE-#xCFFFF],
[#xDFFFE-#xDFFFF], [#xEFFFE-#xEFFFF], [#xFFFFE-#xFFFFF],
[#x10FFFE-#x10FFFF].

It would not be wise for HTML5 to limit itself to the more constrained
character set of XML.  In particular, the form feed character is
pretty popular,

This is yet another case where "take HTML5, read it into a DOM, and
serialize it as XML, and voilà: you have valid XHTML" doesn't work.

--
Elliotte Rusty Harold  [EMAIL PROTECTED]
Java I/O 2nd Edition Just Published!
http://www.cafeaulait.org/books/javaio2/
http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/

- Sam Ruby

Reply via email to