Greetings--hi all, I'm a new poster. I read on the unicode.org website that a 
good way to gauge interest and get a proposal through the process is to gather 
feedback and comments here before investing the time in a formal proposal, so, 
here goes...

This posting is to propose the addition of C1 Control Pictures to Unicode. It 
is being proposed by me, Sean Leonard, with the advice and +1 of Frank da Cruz.

Many years ago (in 1998), Frank da Cruz proposed a large number of additional 
characters for terminal emulation and the like, which can be found on the web 
and in the mail list archive variously:
 ADDITIONAL CONTROL PICTURES FOR UNICODE 
ftp://kermit.columbia.edu/kermit/ucsterminal/control.txt
 TERMINAL GRAPHICS FOR UNICODE 
ftp://kermit.columbia.edu/kermit/ucsterminal/ucsterminal.txt
 HEX BYTE PICTURES FOR UNICODE 
ftp://kermit.columbia.edu/kermit/ucsterminal/hex.txt
Subject lines (1998):
 Terminal Graphics Proposal
 Terminal Graphics Draft 2


The proposal I would like to make here is much more modest: this proposal is 
only for the inclusion of C1 Control Pictures into the Unicode Standard. Frank 
explained to me that his original mega-proposals were rejected. However, I 
looked through the "Archive of Notices of Non-Approval" and was unable to find 
an explicit rejection of his proposals. In any event, if one reads through the 
old e-mail threads from 1998, one will find that the C1 Control Pictures subset 
of the proposals received a (luke)warm welcome.

RATIONALE

The Unicode code points U+0000 through U+00FF share the equivalent values from 
the ASCII Standard, ISO 646, ISO 6429, and ISO 8859-1. In many contexts, it is 
desirable to display all of these code points/characters uniquely and 
unambiguously. C0 Control Pictures are currently encoded in the Unicode 
Standard at U+2400; that block currently covers the undisplayable code points 
at U+0000-U+0020 (plus a few extra alternatives/additions). However, the 
undisplayable characters in U+0080-U+00FF are left out.

There are several business cases in which C1 Control Pictures are useful:
1. Terminal emulators need them for debugging.
2. Data analyzers need them so they can have a unique character that when the 
graphics subsystem/text renderers render each character, is intended for 
display rather than for control effects.
3. Engineers can distinguish when communicating between the data without 
side-effects (i.e., control characters as pictures), and the data that invokes 
side-effects (i.e., control characters used as control characters).
4. There are use cases for historic or scholarly purposes, to encode and 
discuss these characters in text, as distinct from invoking their side-effects 
(and displaying nothing).
5. To display all values in U+0000 - U+00FF as distinct _characters_, rather 
than in hexadecimal representation (which makes deciphering the meaning of the 
codes for graphic characters in the ASCII (G0) & ISO 8859-1 (G1) range very 
difficult), in the same width and font as the rest of the graphic characters.

6. In support of 1-5, font designers can design fonts that support C1 Control 
Pictures and that map glyphs to Unicode code points uniformly and 
interchangeably (two key architectural goals of the Unicode Standard). Without 
C1 Control Pictures, it is infeasible to provide graphical representations of 
the C1 Control Characters. This is an asymmetry compared to the C0 Control 
Pictures block in Unicode, and thus should be remedied.

Quoting from the Unicode Standard 6.0.0, sec. 16.1:
There are 65 code points [C0, C1, delete] set aside...for compatibility with 
the C0 and C1 control codes defined in the ISO/IEC 2022 framework.
The Unicode Standard provides for the intact interchange of these code points, 
neither adding to nor subtracting from their semantics. ... [i]n the absence of 
specific application uses, they may be interpreted according to the control 
function semantics specified in ISO/IEC 6429:1992.

In accordance with this and other text in the Standard, it is not really 
possible to assign glyphs uniformly and interchangeably to the code points in 
U+0000-U+001F and U+0080-U+009F. Variation selectors (sec. 16.4), for example, 
"provide a mechanism for specifying a restriction on the set of glyphs that are 
used to represent a particular character [examples given of CJK ideographs and 
Mongolian letters]." Variation selectors and other Unicode-defined control code 
points are ill-suited to causing C1 values to be displayed, because C1 values 
have no "display representation" in and of themselves.


PROPOSED CHARACTERS WITH NOTES

C1 Control Pictures
Hex  Name  Symbol for...

80   PAD   PADDING CHARACTER
Allegedly not in ISO 6429. (Need to check historical versions; other sources.)

81   HOP   HIGH OCTET PRESET
Allegedly not in ISO 6429. (Need to check historical versions; other sources.)

82   BPH   BREAK PERMITTED HERE

83   NBH   NO BREAK HERE

84   IND   INDEX
"Move the active position one line down, to eliminate ambiguity about the 
meaning of LF. Deprecated in 1988 and withdrawn in 1992 from ISO/IEC 6429 (1986 
and 1991 respectively for ECMA-48)." (from Wikipedia)

85   NEL   NEXT LINE

86   SSA   START OF SELECTED AREA

87   ESA   END OF SELECTED AREA

88   HTS   CHARACTER TABULATION SET

89   HTJ   CHARACTER TABULATION WITH JUSTIFICATION

8A   VTS   LINE TABULATION SET

8B   PLD   PARTIAL LINE DOWN

8C   PLU   PARTIAL LINE UP

8D   RI    REVERSE LINE FEED

8E   SS2   SINGLE SHIFT TWO

8F   SS3   SINGLE SHIFT THREE

90   DCS   DEVICE CONTROL STRING

91   PU1   PRIVATE USE ONE

92   PU2   PRIVATE USE TWO

93   STS   SET TRANSMIT STATE

94   CCH   CANCEL CHARACTER

95   MW    MESSAGE WAITING

96   SPA   START OF GUARDED AREA

97   EPA   END OF GUARDED AREA

98   SOS   START OF STRING

99   SGCI  SINGLE GRAPHIC CHARACTER INTRODUCER
Allegedly not in ISO 6429. (Need to check historical versions; other sources.)

9A   SCI   SINGLE CHARACTER INTRODUCER

9B   CSI   CONTROL SEQUENCE INTRODUCER

9C   ST    STRING TERMINATOR

9D   OSC   OPERATING SYSTEM COMMAND

9E   PM    PRIVACY MESSAGE

9F   APC   APPLICATION PROGRAM COMMAND

A0   NBSP  NO-BREAK SPACE
Purpose is to show in distinction to SP (SPACE)

AD   SHY    SOFT HYPHEN
Show - with SHY above or around it, similar to Unicode Standard document for 
U+00AD
(SHY may be the most "controversial" character. See above for rationale--the 
objective is to provide visually distinct characters throughout the 
U+0000-U+00FF range. U+00AD is visually identical to the U+002D hyphen-minus; 
the only distinction is a "control" distinction, which is non-visual. Hence, 
the distinction should be made visually, with a distinct code point.)


UNICODE CODE POINT ASSIGNMENTS

Unicode code point assignments are not explicitly advocated for in this 
initial, informal proposal. While it would be nice to place these codes 
adjacent or in the U+2400 block, there are not enough free code points to 
shoehorn them all in.


MODIFICATIONS TO THE UNICODE STANDARD

It is proposed that section 15.6, Technical Symbols, be extended to discuss 
both C0 and C1 controls.


-Sean Leonard
SeanTek



Reply via email to