Re: C1 Control Pictures Proposal

2011-08-22 Thread Sean Leonard

On Aug 17, 2011, at 4:38 PM, Andrew West wrote:

 
 Unless you can show evidence that C1 control pictures are currently in
 use and that there is a clear demand from the user community to


On Aug 21, 2011, at 10:13 AM, Doug Ewell wrote:

 Perhaps it would help for you to do a quick survey of applications that 
 already make use of the existing C0 control pictures, and include the results 
 in your argument.  That might help convince some of us who feel the C0 
 pictures are only there for compatibility with previous character encodings

This is a reasonable request. In a follow-up post or in any event in the formal 
proposal, I shall include examples of use of and/or demand for the 
representation of control pictures.

I would like to ask you/the list for the sources for C0 control pictures. They 
appear to be ANSI X3.32 and ISO 2047. (Also, FIPS Pub. 1-2, which consolidates 
ANSI X3.32 and some others.) Does anybody have these, and can you look the 
pictures up? In particular, X3.32 is withdrawn...

-Sean



Re: C1 Control Pictures Proposal

2011-08-21 Thread Sean Leonard
Hi Ken et. al.,

On Aug 17, 2011, at 2:49 PM, Ken Whistler wrote:

 
 Further comments:
 
 On 8/13/2011 10:48 AM, Sean Leonard wrote:
 In accordance with this and other text in the Standard, it is not really 
 possible to assign glyphs uniformly and interchangeably to the code points 
 in U+-U+001F and U+0080-U+009F.
 
 Of course it is. The Unicode Standard has done so for years: they are called 
 code chart
 display glyphs. What one cannot expect is that plain text renderers will 
 display control
 characters as visible glyphs in a uniform fashion -- they aren't supposed to, 
 because
 the control codes aren't graphic characters. That is,rather, what show 
 hidden modes
 are all about, and there really aren't any constraints on the details of 
 exactly how
 a show hidden implementation may choose to display the undisplayable, as it 
 were.

Can you please explain where in the Unicode standard you are referring to? Is 
there a show hidden mode or code point sequence in the Unicode Standard? If 
you are referring to code chart display glyphs meaning the glyphs in the 
literal document for U+0080, that is beside the point. If you are referring to 
a show hidden code points mode in an editor (such as a terminal emulator, 
Emacs, Notepad++, or another editor), I understand what you are getting at, but 
that is exactly what is unhelpful. As you point out, there really aren't any 
constraints on the details of exactly how
a show hidden implementation may choose to display the undisplayable--and that 
is exactly the problem. One advantage of my proposal is that fonts that provide 
glyphs for these code points can have glyphs that are visually similar (e.g., 
in monospace dimensions yet remain readable) between that code point and other 
graphic characters. For those who say oh, just have an editor show [HOP] or 
whatever, that is exactly the problem: the editor cannot show [HOP] in a 
uniform way along with the rest of the glyphs that represent U+ - U+007F 
and U+00A0-U+00FF [modulo U+00A0 and U+00AD]. How ironic is it that fonts can 
encode the characters U+-U+001F (and space and delete) uniformly for 
display, yet can do no such similar thing for the other half of these 
characters?

This is definitely not a confusion between glyphs and characters. This is about 
having character code points for a uniform representation of these characters 
as-displayed in interchange, so that two systems (e.g., an application and the 
graphics rendering subsystem of the operating system, or the graphics rendering 
subsystem of an operating system and the font software that the OS uses) can 
interchange data unambiguously.

The Unicode Standard does not dictate the precise glyphs; it only shows 
representative glyphs. A font designer could choose among alternative glyphs 
for the graphic character code point. For example, for U+001B - U+241B ESCAPE, 
the font designer could choose ESC (scrunched horizontally), ESC (diagonally), 
^[ (scrunched horizontally--^[ is a common legacy rendering of ESC) or ESC with 
a box around it. But because the user has chosen that particular font in that 
particular editor or rendering session, the user would be guaranteed that ESC 
- ^[ (scrunched) would be visually similar to ^\ (file separator, scrunched), 
which would be visually similar to the C1s and to the graphic characters. No 
such guarantee can currently be made without C1 Control Pictures.

 
  Variation selectors (sec. 16.4), for example, provide a mechanism for 
 specifying a restriction on the set of glyphs that are used to represent a 
 particular character [examples given of CJK ideographs and Mongolian 
 letters]. Variation selectors and other Unicode-defined control code points 
 are ill-suited to causing C1 values to be displayed, because C1 values have 
 no display representation in and of themselves.
 
 That whole discussion of variation selectors is beside the point. Variation 
 sequences can
 only be defined for *base* characters. Base characters are a subset of graphic
 characters (see D51 in Chapter 3 of the Unicode Standard). Control characters
 aren't graphic characters. Hence they are not base characters, either, and 
 could
 never be used in variation sequences, anyway.

Correct. As per above, C1 control characters lack graphical variations. Let's 
give them graphics. To display is to know.

-Sean

 
 --Ken
 
 





Re: C1 Control Pictures Proposal

2011-08-21 Thread Doug Ewell
Perhaps it would help for you to do a quick survey of applications that 
already make use of the existing C0 control pictures, and include the 
results in your argument.  That might help convince some of us who feel 
the C0 pictures are only there for compatibility with previous character 
encodings, and aren't really used by anyone, and that a new set of C1 
pictures would meet with similar disuse.


--
Doug Ewell | Thornton, Colorado, USA | RFC 5645, 4645, UTN #14
www.ewellic.org | www.facebook.com/doug.ewell | @DougEwell ­ 





Re: C1 Control Pictures Proposal

2011-08-17 Thread Sean Leonard
On Aug 13, 2011, at 10:48 AM, Sean Leonard wrote:

 Greetings--hi all, I'm a new poster. I read on the unicode.org website that a 
 good way to gauge interest and get a proposal through the process is to 
 gather feedback and comments here before investing the time in a formal 
 proposal, so, here goes...
 
 This posting is to propose the addition of C1 Control Pictures to Unicode. It 
 is being proposed by me, Sean Leonard, with the advice and +1 of Frank da 
 Cruz.

Just putting a *bump* on this post. Any feedback, or, shall I go directly to a 
formal proposal? [Original proposal is in the last e-mail so I am not resending 
the whole thing.]

Thanks,

Sean



Re: C1 Control Pictures Proposal

2011-08-17 Thread Shriramana Sharma

On 08/17/2011 05:14 PM, Sean Leonard wrote:

Just putting a*bump*  on this post. Any feedback, or, shall I go
directly to a formal proposal? [Original proposal is in the last
e-mail so I am not resending the whole thing.]


Is that all the proposal? You realize you'll have to give sample glyphs, 
Unicode Character Properties and the filled out summary form?


[I'm saying this because you yourself said you're relatively new to 
Unicode (or is it just to the mailing list)...]


--
Shriramana Sharma



Re: C1 Control Pictures Proposal

2011-08-17 Thread Doug Ewell
Sean Leonard lists plus unicode at seantek dot com wrote:

 Just putting a *bump* on this post.

Speaking as an individual with personal opinions, and without a vote in
UTC or WG2 (but having followed Unicode for 18 years), I don't see the
need for these additional symbols.

The C0 pictures in the U+2400 block were encoded in Unicode 1.0,
apparently for compatibility with existing standards that included these
pictures.  No such standards seem to exist with C1 pictures.

It might be useful to provide specific examples of data analyzers that
employ the U+2400 characters to display C0 controls, which would be
likely to be updated in the future to support the newly added C1
pictures.

The stated need to be able to discuss these characters in text never
sways me, as I have said before.  [PLD] works just fine.

--
Doug Ewell | Thornton, Colorado, USA | RFC 5645, 4645, UTN #14
www.ewellic.org | www.facebook.com/doug.ewell | @DougEwell ­






Re: C1 Control Pictures Proposal

2011-08-17 Thread Ken Whistler

In general, I agree with Doug Ewell's assessment. I don't see a convincing
case here for the need to encode more control picture characters for C1
controls. There seems to be a confusion here between the need for
glyphs and the need for characters. Also, this would seem to me to
be a receding horizon kind of problem. The same arguments could be
(and have been) claimed for Unicode format controls, which also don't
have visible displays of their own.

Further comments:

On 8/13/2011 10:48 AM, Sean Leonard wrote:

In accordance with this and other text in the Standard, it is not really 
possible to assign glyphs uniformly and interchangeably to the code points in 
U+-U+001F and U+0080-U+009F.


Of course it is. The Unicode Standard has done so for years: they are 
called code chart
display glyphs. What one cannot expect is that plain text renderers will 
display control
characters as visible glyphs in a uniform fashion -- they aren't 
supposed to, because
the control codes aren't graphic characters. That is,rather, what show 
hidden modes
are all about, and there really aren't any constraints on the details of 
exactly how
a show hidden implementation may choose to display the undisplayable, as 
it were.



  Variation selectors (sec. 16.4), for example, provide a mechanism for specifying a 
restriction on the set of glyphs that are used to represent a particular character [examples given 
of CJK ideographs and Mongolian letters]. Variation selectors and other Unicode-defined 
control code points are ill-suited to causing C1 values to be displayed, because C1 values have no 
display representation in and of themselves.


That whole discussion of variation selectors is beside the point. 
Variation sequences can
only be defined for *base* characters. Base characters are a subset of 
graphic
characters (see D51 in Chapter 3 of the Unicode Standard). Control 
characters
aren't graphic characters. Hence they are not base characters, either, 
and could

never be used in variation sequences, anyway.

--Ken





Re: C1 Control Pictures Proposal

2011-08-17 Thread Andrew West
On 13 August 2011 18:48, Sean Leonard lists+unic...@seantek.com wrote:

 The Unicode code points U+ through U+00FF share the equivalent values 
 from the ASCII Standard, ISO 646, ISO 6429, and ISO 8859-1. In many contexts, 
 it is desirable to display all of these code points/characters uniquely and 
 unambiguously. C0 Control Pictures are currently encoded in the Unicode 
 Standard at U+2400; that block currently covers the undisplayable code points 
 at U+-U+0020 (plus a few extra alternatives/additions). However, the 
 undisplayable characters in U+0080-U+00FF are left out.

 There are several business cases in which C1 Control Pictures are useful:
 1. Terminal emulators need them for debugging.
 2. Data analyzers need them so they can have a unique character that when the 
 graphics subsystem/text renderers render each character, is intended for 
 display rather than for control effects.
 3. Engineers can distinguish when communicating between the data without 
 side-effects (i.e., control characters as pictures), and the data that 
 invokes side-effects (i.e., control characters used as control characters).
 4. There are use cases for historic or scholarly purposes, to encode and 
 discuss these characters in text, as distinct from invoking their 
 side-effects (and displaying nothing).
 5. To display all values in U+ - U+00FF as distinct _characters_, rather 
 than in hexadecimal representation (which makes deciphering the meaning of 
 the codes for graphic characters in the ASCII (G0)  ISO 8859-1 (G1) range 
 very difficult), in the same width and font as the rest of the graphic 
 characters.

 6. In support of 1-5, font designers can design fonts that support C1 Control 
 Pictures and that map glyphs to Unicode code points uniformly and 
 interchangeably (two key architectural goals of the Unicode Standard). 
 Without C1 Control Pictures, it is infeasible to provide graphical 
 representations of the C1 Control Characters. This is an asymmetry compared 
 to the C0 Control Pictures block in Unicode, and thus should be remedied.

It would probably be useful to read the WG2 Principles and Procedures document

http://std.dkuug.dk/JTC1/SC2/WG2/docs/n3902.pdf

particularly Annex H Criteria for encoding symbols, which states that:

The fact that a symbol merely seems to be useful or potentially
useful is precisely not a reason to code it.
Demonstrated usage, or demonstrated demand, on the other hand, does
constitute a good reason to
encode the symbol. (H10 on p.37)

Unless you can show evidence that C1 control pictures are currently in
use and that there is a clear demand from the user community to
represent them in plain text it is unlikely that your proposal will
get very far.

Andrew