On 07/12/2003 15:40, Philippe Verdy wrote:

Of course there is an even simpler way to provide the glue I was talking about. W3C simply needs to relax the rule forbidding combining marks at the start of a string (and interpret the one precomposed character with ">" as base as if it were decomposed, as I suggested before), and, remembering that use of NFC is a strong recommendation rather than a requirement, not insist on NFC in such cases. Then nothing needs to be added to Unicode.



There's little chance that this will be relaxed by the W3C, because now HTML is XML (since XHTML is the current recommanded standard, and HTML 4.01 is just kept as is, and all other extensions are being developped since XHTML 1.1 as modules with DTDs or XML schemas), and because XML text elements are independant. What you propose would break the XML containment model (could it be implemented however in XSLT transforms from XHTML? I doubt because the output of XSLT is also XML, even if it does not always produce a XML syntax, but only a DOM-parsable tree or InfoSet...)





Well, this is W3C's problem. They seem to have backed themselves into a corner which they need to get out of but have no easy way of doing so. Unicode is of course very familiar with this kind of situation e.g. with character name errors, combining class errors, 11000+ redundant Korean characters without decompositions, etc etc. So no doubt it can extend its sympathy; and possibly even offer to help by encoding the kind of character I was suggesting early (perhaps in exchange for some W3C readiness to accept correction of errors in the normalisation data?). But really this is not a Unicode issue.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/





Reply via email to