2012/6/2 William_J_G Overington <[email protected]>: > There is a paradox in that, at present, in order for a new electronic > character-based communication technology to become introduced into regular > Unicode that evidence of its existing widespread use in a Private Use Area > context is needed: yet producing that existing widespread use in a Private > Use Area context is both unrealistic because it would be a Private Use Area > implementation and also that very supposed Private Use Area implementation > would damage the implementation and use of a regular Unicode solution for > many years. > > The point is that such new technologies need to be introduced in a process > that is managed by Unicode and ISO Committees. For Unicode, the code points > could be encoded by the Unicode Technical Committee yet the individual > encodings using those code points could be carried out by another Unicode > Committee, which particular committee being a matter to be decided.
The paradoc is apparently solved by having books printed and made widely available (or old enough in history for its content to have fallen in public domain and being freely reusable, so that copies start spreading in various communities). In old eras, you did not need characters to be encoded, you just drew them with a brush, or sealed them in metal types or used a knife to cut it in the wood. Now we want new characters being encoded, but we are frozen by copyright and political issues. But when new characters will be introduced, font technologies may be used but only for limited spread (widespread use requires converting plain text into rendered files, such as PDF's or using embedded graphics in riche-text documents. But now the UTC members are saying that these characters are not necessary because they are graphic files. As if the need for use in printed documents was still not necessary (even though those documents are no longer graphic files, but full pages where all glyphs are rendered the same way, and applications like OCR will shoke on unknown glyphs found in those books). So the good question is why do we need "plain-text" ? it is to allow full search indexing and transformation of the content of those documents, and to allow further works on those documents to create derived docs more easily (even if the page layouts are largely transformed), or for creating translations, or integration of those "data" elements in other contexts. Plain-text is just them: being able to extract a parsable semantic from a rendered text. Initially, no books are plain-text, they are alwys graphic. But they are still used as strong evidences for encoding. I don't understand the discrimination between glyphs stored as computer graphics, and printed books (or other real artistic artistic/cultural works on various materials such as stone, wood and ceramics) for a valid source and evidence for encoding. Flags a good example, because they exist in various materialized forms, not just in computer graphics ! And they are used in really a lot of very different contexts. They are perfect candidate for encoding, except that their colors cause problems with the encoding model (mostly for the representative glyph), as well as their graphic designs which are protected, restricted, or even forbidden of most uses in some countries. Bur just like other characters, or like languages, flags can also be unified in their allowed variations, and still allow to encode additiona variations (in Unicode we have variant selectors, in languages codes, we have additional subtags, in ISO 3166 we have subcodes as well that can be appended to existing codes). All these unifications and encoded variations require a specific registry. But unlike character variants, that remain basic glyphs, or coutry codes, that remain codes composed of normal characters, flags are unique by their color and only meaningful by their graphic design. A solution for their unification will then require such a registry and a convention for naming them in that registry. We can start using modest codes, but given the huge number of existing flags, and the fact that the UTC or the CLDR TC had no competence in this domain (when other groups jave started collecting data many years before Unicode ever started to work...), I will not suggest that the Unicode consortium (and neither the WG2 at ISO) hosts this registry. There's since long an established large fenderation of associations that have provided researches, websites, and data (with large parts of it freely available). But the most frequent use of flags is still an a mall number of them. And we can already itegrate them, with a model that will azlso allow easy transition with other large collections of flags. Please be pragmati here ! Admit that the "plain-text" need exists (even if it is still resolved with some difficulties using embedded graphics that are not parsable easily and not always interoperable due to their internal formats not supported across all platforms, as well as due to their size, so that thay cannot always be embedded). Don't lie to yourself by saying that nobody has ever wanted them to be encoded. Even if they are not encoded in Unicode, they are encoded using various private-use schemes in lots of systems, but they are not interoperable (and this is the main reason why you don't see them expose in "strange" encodings, because those falgs are usually not used exclsuively when creating large texts, but they are spread within larger texts or data tables).

