Re: C1 Control Pictures Proposal

Sean Leonard Sun, 21 Aug 2011 09:58:13 -0700

Hi Ken et. al.,

On Aug 17, 2011, at 2:49 PM, Ken Whistler wrote:

> 
> Further comments:
> 
> On 8/13/2011 10:48 AM, Sean Leonard wrote:
>> In accordance with this and other text in the Standard, it is not really 
>> possible to assign glyphs uniformly and interchangeably to the code points 
>> in U+0000-U+001F and U+0080-U+009F.
> 
> Of course it is. The Unicode Standard has done so for years: they are called 
> code chart
> display glyphs. What one cannot expect is that plain text renderers will 
> display control
> characters as visible glyphs in a uniform fashion -- they aren't supposed to, 
> because
> the control codes aren't graphic characters. That is,rather, what "show 
> hidden" modes
> are all about, and there really aren't any constraints on the details of 
> exactly how
> a show hidden implementation may choose to display the undisplayable, as it 
> were.

Can you please explain where in the Unicode standard you are referring to? Is 
there a "show hidden" mode or code point sequence in the Unicode Standard? If 
you are referring to "code chart display glyphs" meaning the glyphs in the 
literal document for U+0080, that is beside the point. If you are referring to 
a "show hidden code points" mode in an editor (such as a terminal emulator, 
Emacs, Notepad++, or another editor), I understand what you are getting at, but 
that is exactly what is unhelpful. As you point out, "there really aren't any 
constraints on the details of exactly how
a show hidden implementation may choose to display the undisplayable"--and that 
is exactly the problem. One advantage of my proposal is that fonts that provide 
glyphs for these code points can have glyphs that are visually similar (e.g., 
in monospace dimensions yet remain readable) between that code point and other 
graphic characters. For those who say "oh, just have an editor show [HOP] or 
whatever", that is exactly the problem: the editor cannot show [HOP] in a 
uniform way along with the rest of the glyphs that represent U+0000 - U+007F 
and U+00A0-U+00FF [modulo U+00A0 and U+00AD]. How ironic is it that fonts can 
encode the characters U+0000-U+001F (and space and delete) uniformly for 
display, yet can do no such similar thing for the other half of these 
characters?

This is definitely not a confusion between glyphs and characters. This is about 
having character code points for a uniform representation of these characters 
as-displayed in interchange, so that two systems (e.g., an application and the 
graphics rendering subsystem of the operating system, or the graphics rendering 
subsystem of an operating system and the font software that the OS uses) can 
interchange data unambiguously.

The Unicode Standard does not dictate the precise glyphs; it only shows 
representative glyphs. A font designer could choose among alternative glyphs 
for the graphic character code point. For example, for U+001B -> U+241B ESCAPE, 
the font designer could choose ESC (scrunched horizontally), ESC (diagonally), 
^[ (scrunched horizontally--^[ is a common legacy rendering of ESC) or ESC with 
a box around it. But because the user has chosen that particular font in that 
particular editor or rendering session, the user would be guaranteed that ESC 
-> ^[ (scrunched) would be visually similar to ^\ (file separator, scrunched), 
which would be visually similar to the C1s and to the graphic characters. No 
such guarantee can currently be made without C1 Control Pictures.

> 
>>  Variation selectors (sec. 16.4), for example, "provide a mechanism for 
>> specifying a restriction on the set of glyphs that are used to represent a 
>> particular character [examples given of CJK ideographs and Mongolian 
>> letters]." Variation selectors and other Unicode-defined control code points 
>> are ill-suited to causing C1 values to be displayed, because C1 values have 
>> no "display representation" in and of themselves.
> 
> That whole discussion of variation selectors is beside the point. Variation 
> sequences can
> only be defined for *base* characters. Base characters are a subset of graphic
> characters (see D51 in Chapter 3 of the Unicode Standard). Control characters
> aren't graphic characters. Hence they are not base characters, either, and 
> could
> never be used in variation sequences, anyway.

Correct. As per above, C1 control characters lack graphical variations. Let's 
give them graphics. To display is to know.

-Sean

> 
> --Ken
> 
>

Re: C1 Control Pictures Proposal

Reply via email to