JÃrg Knappen <knappen at uni dash mainz dot de> wrote: > I see a precedent in Unicode to treat Copyright-like sign differently > from simple encircled letters: > > Unicode takes precautions not to encode the same character twice. > Therefore, superscript digits 2 and 3 are absent from the superscript > block U+2070 ff. > > However, the Block eclosed alphanumerics U+2460 ff includes encircled > capital latin letters C, P, and R in addition to the copyright-like > sing elsewhere.
OK, I guess I need some guidance from the Unicode elder statesmen and greater experts. I have been under the impression all along that what JÃrg calls "copyright-like signs," meaning U+00A9 and U+00AE and U+2117 and possibly others, are encoded are separate entities primarily because they were in pre-existing legacy character sets. Remember that a major goal of Unicode at its inception was to make sure all such character sets were covered. Obviously U+00A9 and U+00AE were in ISO 8859-1, at those same code points. They also appeared in MS-DOS code page 850, which also predated Unicode. I don't know if U+2117 was in any existing standards; I just know it's in my Unicode 1.0 book. JÃrg's comments imply that these symbols are in Unicode because of a policy or "precedent" for treating such symbols specially, not (or not only) because of the policy of encoding whatever was in the legacy character sets of the time. Let's suppose we were back in the mid-'90s, and for whatever reason, the circled Latin letters in the U+24xx block were already encoded but the three "copyright-like signs" were not. Suppose they weren't in any legacy character sets either. (Use your imagination.) Now suppose someone proposed that the circled-C copyright symbol (picking the most widely used example) be encoded as a separate entity. Suppose further that someone else pointed out that it could be represented by one of the circled Latin letters in the U+24xx zone (â or â), and a debate ensued over whether those letters were of the correct size. Finally, let's suppose that someone else suggested using the combination U+0043 (or U+0063) plus U+20DD, the combining enclosing circle, and that we then had a debate over whether fonts and rendering engines were up to the task. What would UTC and WG2 do? Would they choose to encode COPYRIGHT SIGN on its own, recommend the existing circled Latin letters, or recommend the combining sequence? Why? (Use a separate sheet of paper if necessary.) -Doug Ewell Fullerton, California http://users.adelphia.net/~dewell/

