Hey thanks - I think I've got all that now. Of course, I'm tempted to wonder whether or not it would have made more sense to simply have introduced a few new combining characters in plane 0, such as: "make bold", "make italic", "make script", "make fraktur", "make double-struck", "make sans serif", "make monospace" and "make tag". This would not only have achieved the same effect (and with the same space requirements too, at least for things like "bold uppercase A" in UTF-16), but with much greater flexibility (in that you could also make _other_ characters bold too, and you could create combinations of the attributes not currently represented).
I still haven't figured out what "fullwidth" means though. I don't really understand in what way a "full width full stop" (FF0E) is different from a "full stop" (002E), etc. I _have_ downloaded, and read in entirety, the code chart document for FF00-FFEF, and nothing in that document explains to me why these characters are necessary. Does anyone have any clues on that one? Thanks Jill -----Original Message----- From: John Cowan [mailto:[EMAIL PROTECTED] Sent: Monday, August 11, 2003 12:26 PM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: Newbie Question - what are all those duplicated characters FO R? [EMAIL PROTECTED] scripsit: > Stefan has effectively dealt with SOME of my confusion, but questions > remain. For example: between 1D49C (mathematical script capital A) and > 1D49E(mathematical script capital C) we find 1D49D (<reserved>). What is it > reserved for? I am aware that codepoint 212C is script capital B, but why > does that justify leaving a "hole" in the codepoint space? Why not just omit > "mathematical script capital B" without leaving a hole? (i.e. why not just > go straight from A to C?). Primarily for implementation simplicity. It's possible to convert between any of the mathematical "fonts" and any other, or the corresponding "normal" ones, with a simple offset plus a short table of exceptions. Code space on plane 1 just isn't that precious. Similar things have been done throughout Unicode: for example, in the main Greek block, there is a hole where "capital letter final sigma" would be, since there is no such character: the final/non-final distinction is not made in capital letters. > More questions. From E0020 to E007E we have "tag space" through to "tag > tilde". These are copies of the Basic Latin block at 0020. I still don't > know what they are for. The tag characters are used to embed tags, specifically language tags, in contexts where markup is too heavyweight but it seems essential to record the language of a text. One such application is in protocol design, where it is occasionally necessary to pass around human-readable strings within the protocol, and it is desirable to supply the correcdt string for a given language. All other uses are strongly discouraged. But if you have to do it, you can encode "en-us" (the language code for U.S. English) using <E0001, E0065, E006E, E002D, E0075, E0073>. For all purposes other than language identification, tag characters are ignored. -- John Cowan [EMAIL PROTECTED] www.ccil.org/~cowan www.reutershealth.com "In computer science, we stand on each other's feet." --Brian K. Reid

