This has bitten more than a few people. For political reasons, having to do with the synchronization of names to ISO 10646, the name fields are empty for the control characters. That is because (at least in theory) people could have other semantics for those characters.
Field 10 (called Unicode 1.0 Name) contains names for most of those characters, and should be used for your purpose. See, for example, http://www.unicode.org/Public/BETA/Unicode3.2/UnicodeData-3.2.0d1.html where it says: "This is the old name as published in Unicode 1.0. This name is only provided when it is significantly different from the current name for the character. The value of field 10 for control characters does not always match the Unicode 1.0 names. Instead, field 10 contains ISO 6429 names for control functions, for printing in the code charts." Thus the data from http://www.unicode.org/Public/BETA/Unicode3.2/UnicodeData-3.2.0d8.txt has the following. Note the use of parantheses for some (but not all) abbreviated names, and that some of the names follow the updated ISO 6429 names, e.g. CHARACTER TABULATION instead of the better-known HORIZONTAL TABULATION (HT). 0000;<control>;Cc;0;BN;;;;;N;NULL;;;; 0001;<control>;Cc;0;BN;;;;;N;START OF HEADING;;;; 0002;<control>;Cc;0;BN;;;;;N;START OF TEXT;;;; 0003;<control>;Cc;0;BN;;;;;N;END OF TEXT;;;; 0004;<control>;Cc;0;BN;;;;;N;END OF TRANSMISSION;;;; 0005;<control>;Cc;0;BN;;;;;N;ENQUIRY;;;; 0006;<control>;Cc;0;BN;;;;;N;ACKNOWLEDGE;;;; 0007;<control>;Cc;0;BN;;;;;N;BELL;;;; 0008;<control>;Cc;0;BN;;;;;N;BACKSPACE;;;; 0009;<control>;Cc;0;S;;;;;N;CHARACTER TABULATION;;;; 000A;<control>;Cc;0;B;;;;;N;LINE FEED (LF);;;; 000B;<control>;Cc;0;S;;;;;N;LINE TABULATION;;;; 000C;<control>;Cc;0;WS;;;;;N;FORM FEED (FF);;;; 000D;<control>;Cc;0;B;;;;;N;CARRIAGE RETURN (CR);;;; 000E;<control>;Cc;0;BN;;;;;N;SHIFT OUT;;;; 000F;<control>;Cc;0;BN;;;;;N;SHIFT IN;;;; 0010;<control>;Cc;0;BN;;;;;N;DATA LINK ESCAPE;;;; 0011;<control>;Cc;0;BN;;;;;N;DEVICE CONTROL ONE;;;; 0012;<control>;Cc;0;BN;;;;;N;DEVICE CONTROL TWO;;;; 0013;<control>;Cc;0;BN;;;;;N;DEVICE CONTROL THREE;;;; 0014;<control>;Cc;0;BN;;;;;N;DEVICE CONTROL FOUR;;;; 0015;<control>;Cc;0;BN;;;;;N;NEGATIVE ACKNOWLEDGE;;;; 0016;<control>;Cc;0;BN;;;;;N;SYNCHRONOUS IDLE;;;; 0017;<control>;Cc;0;BN;;;;;N;END OF TRANSMISSION BLOCK;;;; 0018;<control>;Cc;0;BN;;;;;N;CANCEL;;;; 0019;<control>;Cc;0;BN;;;;;N;END OF MEDIUM;;;; 001A;<control>;Cc;0;BN;;;;;N;SUBSTITUTE;;;; 001B;<control>;Cc;0;BN;;;;;N;ESCAPE;;;; 001C;<control>;Cc;0;B;;;;;N;INFORMATION SEPARATOR FOUR;;;; 001D;<control>;Cc;0;B;;;;;N;INFORMATION SEPARATOR THREE;;;; 001E;<control>;Cc;0;B;;;;;N;INFORMATION SEPARATOR TWO;;;; 001F;<control>;Cc;0;S;;;;;N;INFORMATION SEPARATOR ONE;;;; 007F;<control>;Cc;0;BN;;;;;N;DELETE;;;; 0080;<control>;Cc;0;BN;;;;;N;;;;; 0081;<control>;Cc;0;BN;;;;;N;;;;; 0082;<control>;Cc;0;BN;;;;;N;BREAK PERMITTED HERE;;;; 0083;<control>;Cc;0;BN;;;;;N;NO BREAK HERE;;;; 0084;<control>;Cc;0;BN;;;;;N;;;;; 0085;<control>;Cc;0;B;;;;;N;NEXT LINE (NEL);;;; 0086;<control>;Cc;0;BN;;;;;N;START OF SELECTED AREA;;;; 0087;<control>;Cc;0;BN;;;;;N;END OF SELECTED AREA;;;; 0088;<control>;Cc;0;BN;;;;;N;CHARACTER TABULATION SET;;;; 0089;<control>;Cc;0;BN;;;;;N;CHARACTER TABULATION WITH JUSTIFICATION;;;; 008A;<control>;Cc;0;BN;;;;;N;LINE TABULATION SET;;;; 008B;<control>;Cc;0;BN;;;;;N;PARTIAL LINE FORWARD;;;; 008C;<control>;Cc;0;BN;;;;;N;PARTIAL LINE BACKWARD;;;; 008D;<control>;Cc;0;BN;;;;;N;REVERSE LINE FEED;;;; 008E;<control>;Cc;0;BN;;;;;N;SINGLE SHIFT TWO;;;; 008F;<control>;Cc;0;BN;;;;;N;SINGLE SHIFT THREE;;;; 0090;<control>;Cc;0;BN;;;;;N;DEVICE CONTROL STRING;;;; 0091;<control>;Cc;0;BN;;;;;N;PRIVATE USE ONE;;;; 0092;<control>;Cc;0;BN;;;;;N;PRIVATE USE TWO;;;; 0093;<control>;Cc;0;BN;;;;;N;SET TRANSMIT STATE;;;; 0094;<control>;Cc;0;BN;;;;;N;CANCEL CHARACTER;;;; 0095;<control>;Cc;0;BN;;;;;N;MESSAGE WAITING;;;; 0096;<control>;Cc;0;BN;;;;;N;START OF GUARDED AREA;;;; 0097;<control>;Cc;0;BN;;;;;N;END OF GUARDED AREA;;;; 0098;<control>;Cc;0;BN;;;;;N;START OF STRING;;;; 0099;<control>;Cc;0;BN;;;;;N;;;;; 009A;<control>;Cc;0;BN;;;;;N;SINGLE CHARACTER INTRODUCER;;;; 009B;<control>;Cc;0;BN;;;;;N;CONTROL SEQUENCE INTRODUCER;;;; 009C;<control>;Cc;0;BN;;;;;N;STRING TERMINATOR;;;; 009D;<control>;Cc;0;BN;;;;;N;OPERATING SYSTEM COMMAND;;;; 009E;<control>;Cc;0;BN;;;;;N;PRIVACY MESSAGE;;;; 009F;<control>;Cc;0;BN;;;;;N;APPLICATION PROGRAM COMMAND;;;; Personally, I think that this is error-prone, and the UTC would be far better off instead putting the control code names in field 1, and simply documenting that field 1 contains the character names for non-control characters and the ISO 6429 names for control characters. Fewer people like yourselves would be unpleasantly surprised. Mark ————— Πόλλ’ ἠπίστατο ἔργα, κακῶς δ’ ἠπίστατο πάντα — Ὁμήρου Μαργίτῃ [For transliteration, see http://oss.software.ibm.com/cgi-bin/icu/tr] http://www.macchiato.com ----- Original Message ----- From: "Jarkko Hietaniemi" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Sunday, February 03, 2002 11:03 Subject: names of the control characters > A question: Perl offers a way to use Unicode characters by name: > > use charnames ':full'; > > $a = "fooba\N{LATIN LETTER SMALL SHARP S}"; > > but I noticed that the C0 and C1 control characters no more have > Official Unicode names, all they have left is <control> in the name > field and the Unicode 1.0 name. This means that things like > > $b = "x\N{HORIZONTAL TABULATION}y"; > > won't work. What's the story behind the "unnaming" of the C0 and C1? > > -- > $jhi++; # http://www.iki.fi/jhi/ > # There is this special biologist word we use for 'stable'. > # It is 'dead'. -- Jack Cohen >

