On Dec 1, 2006, at 14:38, Elliotte Harold wrote:
1. Are private use characters allowed?
I think the answer should be "Yes", because not allowing them could
make people subvert Unicode and use e.g. Latin-1 code points for a
different purpose with a bogus font. Also, not allowing them would be
a violation of Charmod requirements for specs.
2. Are control characters allowed (probably yes, based on other
parts of the spec).
Personally, I'd like to make non-conforming the control characters
that XML 1.0 disallows (in order to keep conforming HTML5 documents
convertible to XHTML5) as well as C1 controls (because they have no
legitimate use in HTML but are a sign of a common bug).
3. Are surrogate characters allowed? (probably no)
Surrogates are an artifact of UTF-16. They have no place on the
character level. So I'd say "No".
6. Are noncharacters U+FDD0..U+FDEF allowed (?)
7. Are the noncharacters from the last two characters of each plane
allowed (?)
I don't have particularly strong feelings here. Putting those
characters is HTML is a bad idea, but allowing them is not a problem
for HTML5 to XHTML5 conversion and they aren't a common problem like
C1 controls.
--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/