On Fri, 30 May 2014 12:26:18 -0600 Karl Williamson <[email protected]> wrote:
> I'm having a problem with this > http://www.unicode.org/versions/corrigendum9.html > Some people now think it means that noncharacters are really no > different from private-use characters, and should be treated very > similarly if not identically. > It seems to me that they should be illegal in open interchange, or > perhaps illegal in interchange without prior agreement. So one just puts a notice on the web site saying that by downloading CLDR files one agrees to accept non-characters. Part of the original problem is that the CLDR mechanism for identifying Unicode scalar values in XML rather than quoting them (albeit by numeric entities) was broken. > Thus, I don't see how noncharacters can be considered to be valid in > public interchange, given that the producers have to assume that the > consumers will not accept them. The publishing of the CLDR data was strictly limited to the Milky Way, and will remain so for several decades at the very least. Therefore it was not public interchange. Practically, there is the very real issue that a system may be useful enough to be used as part of a larger system, and therefore called upon to handle any Unicode scalar value. One possible solution is to use, instead of non-characters, lone low surrogates. These have the advantage of having obvious representations for use with all three coding forms. Of course, internal checks on the well-formedness of Unicode strings would have to be relaxed, and one might prefer to use them doubled in UTF-16 so as not to weaken checks for broken strings. Richard. _______________________________________________ Unicode mailing list [email protected] http://unicode.org/mailman/listinfo/unicode

