Re: C1 Control Pictures Proposal
On Aug 17, 2011, at 4:38 PM, Andrew West wrote: Unless you can show evidence that C1 control pictures are currently in use and that there is a clear demand from the user community to On Aug 21, 2011, at 10:13 AM, Doug Ewell wrote: Perhaps it would help for you to do a quick survey of applications that already make use of the existing C0 control pictures, and include the results in your argument. That might help convince some of us who feel the C0 pictures are only there for compatibility with previous character encodings This is a reasonable request. In a follow-up post or in any event in the formal proposal, I shall include examples of use of and/or demand for the representation of control pictures. I would like to ask you/the list for the sources for C0 control pictures. They appear to be ANSI X3.32 and ISO 2047. (Also, FIPS Pub. 1-2, which consolidates ANSI X3.32 and some others.) Does anybody have these, and can you look the pictures up? In particular, X3.32 is withdrawn... -Sean
Re: C1 Control Pictures Proposal
Hi Ken et. al., On Aug 17, 2011, at 2:49 PM, Ken Whistler wrote: Further comments: On 8/13/2011 10:48 AM, Sean Leonard wrote: In accordance with this and other text in the Standard, it is not really possible to assign glyphs uniformly and interchangeably to the code points in U+-U+001F and U+0080-U+009F. Of course it is. The Unicode Standard has done so for years: they are called code chart display glyphs. What one cannot expect is that plain text renderers will display control characters as visible glyphs in a uniform fashion -- they aren't supposed to, because the control codes aren't graphic characters. That is,rather, what show hidden modes are all about, and there really aren't any constraints on the details of exactly how a show hidden implementation may choose to display the undisplayable, as it were. Can you please explain where in the Unicode standard you are referring to? Is there a show hidden mode or code point sequence in the Unicode Standard? If you are referring to code chart display glyphs meaning the glyphs in the literal document for U+0080, that is beside the point. If you are referring to a show hidden code points mode in an editor (such as a terminal emulator, Emacs, Notepad++, or another editor), I understand what you are getting at, but that is exactly what is unhelpful. As you point out, there really aren't any constraints on the details of exactly how a show hidden implementation may choose to display the undisplayable--and that is exactly the problem. One advantage of my proposal is that fonts that provide glyphs for these code points can have glyphs that are visually similar (e.g., in monospace dimensions yet remain readable) between that code point and other graphic characters. For those who say oh, just have an editor show [HOP] or whatever, that is exactly the problem: the editor cannot show [HOP] in a uniform way along with the rest of the glyphs that represent U+ - U+007F and U+00A0-U+00FF [modulo U+00A0 and U+00AD]. How ironic is it that fonts can encode the characters U+-U+001F (and space and delete) uniformly for display, yet can do no such similar thing for the other half of these characters? This is definitely not a confusion between glyphs and characters. This is about having character code points for a uniform representation of these characters as-displayed in interchange, so that two systems (e.g., an application and the graphics rendering subsystem of the operating system, or the graphics rendering subsystem of an operating system and the font software that the OS uses) can interchange data unambiguously. The Unicode Standard does not dictate the precise glyphs; it only shows representative glyphs. A font designer could choose among alternative glyphs for the graphic character code point. For example, for U+001B - U+241B ESCAPE, the font designer could choose ESC (scrunched horizontally), ESC (diagonally), ^[ (scrunched horizontally--^[ is a common legacy rendering of ESC) or ESC with a box around it. But because the user has chosen that particular font in that particular editor or rendering session, the user would be guaranteed that ESC - ^[ (scrunched) would be visually similar to ^\ (file separator, scrunched), which would be visually similar to the C1s and to the graphic characters. No such guarantee can currently be made without C1 Control Pictures. Variation selectors (sec. 16.4), for example, provide a mechanism for specifying a restriction on the set of glyphs that are used to represent a particular character [examples given of CJK ideographs and Mongolian letters]. Variation selectors and other Unicode-defined control code points are ill-suited to causing C1 values to be displayed, because C1 values have no display representation in and of themselves. That whole discussion of variation selectors is beside the point. Variation sequences can only be defined for *base* characters. Base characters are a subset of graphic characters (see D51 in Chapter 3 of the Unicode Standard). Control characters aren't graphic characters. Hence they are not base characters, either, and could never be used in variation sequences, anyway. Correct. As per above, C1 control characters lack graphical variations. Let's give them graphics. To display is to know. -Sean --Ken
Re: C1 Control Pictures Proposal
Perhaps it would help for you to do a quick survey of applications that already make use of the existing C0 control pictures, and include the results in your argument. That might help convince some of us who feel the C0 pictures are only there for compatibility with previous character encodings, and aren't really used by anyone, and that a new set of C1 pictures would meet with similar disuse. -- Doug Ewell | Thornton, Colorado, USA | RFC 5645, 4645, UTN #14 www.ewellic.org | www.facebook.com/doug.ewell | @DougEwell
Re: C1 Control Pictures Proposal
On Aug 13, 2011, at 10:48 AM, Sean Leonard wrote: Greetings--hi all, I'm a new poster. I read on the unicode.org website that a good way to gauge interest and get a proposal through the process is to gather feedback and comments here before investing the time in a formal proposal, so, here goes... This posting is to propose the addition of C1 Control Pictures to Unicode. It is being proposed by me, Sean Leonard, with the advice and +1 of Frank da Cruz. Just putting a *bump* on this post. Any feedback, or, shall I go directly to a formal proposal? [Original proposal is in the last e-mail so I am not resending the whole thing.] Thanks, Sean
Re: C1 Control Pictures Proposal
On 08/17/2011 05:14 PM, Sean Leonard wrote: Just putting a*bump* on this post. Any feedback, or, shall I go directly to a formal proposal? [Original proposal is in the last e-mail so I am not resending the whole thing.] Is that all the proposal? You realize you'll have to give sample glyphs, Unicode Character Properties and the filled out summary form? [I'm saying this because you yourself said you're relatively new to Unicode (or is it just to the mailing list)...] -- Shriramana Sharma
Re: C1 Control Pictures Proposal
Sean Leonard lists plus unicode at seantek dot com wrote: Just putting a *bump* on this post. Speaking as an individual with personal opinions, and without a vote in UTC or WG2 (but having followed Unicode for 18 years), I don't see the need for these additional symbols. The C0 pictures in the U+2400 block were encoded in Unicode 1.0, apparently for compatibility with existing standards that included these pictures. No such standards seem to exist with C1 pictures. It might be useful to provide specific examples of data analyzers that employ the U+2400 characters to display C0 controls, which would be likely to be updated in the future to support the newly added C1 pictures. The stated need to be able to discuss these characters in text never sways me, as I have said before. [PLD] works just fine. -- Doug Ewell | Thornton, Colorado, USA | RFC 5645, 4645, UTN #14 www.ewellic.org | www.facebook.com/doug.ewell | @DougEwell
Re: C1 Control Pictures Proposal
In general, I agree with Doug Ewell's assessment. I don't see a convincing case here for the need to encode more control picture characters for C1 controls. There seems to be a confusion here between the need for glyphs and the need for characters. Also, this would seem to me to be a receding horizon kind of problem. The same arguments could be (and have been) claimed for Unicode format controls, which also don't have visible displays of their own. Further comments: On 8/13/2011 10:48 AM, Sean Leonard wrote: In accordance with this and other text in the Standard, it is not really possible to assign glyphs uniformly and interchangeably to the code points in U+-U+001F and U+0080-U+009F. Of course it is. The Unicode Standard has done so for years: they are called code chart display glyphs. What one cannot expect is that plain text renderers will display control characters as visible glyphs in a uniform fashion -- they aren't supposed to, because the control codes aren't graphic characters. That is,rather, what show hidden modes are all about, and there really aren't any constraints on the details of exactly how a show hidden implementation may choose to display the undisplayable, as it were. Variation selectors (sec. 16.4), for example, provide a mechanism for specifying a restriction on the set of glyphs that are used to represent a particular character [examples given of CJK ideographs and Mongolian letters]. Variation selectors and other Unicode-defined control code points are ill-suited to causing C1 values to be displayed, because C1 values have no display representation in and of themselves. That whole discussion of variation selectors is beside the point. Variation sequences can only be defined for *base* characters. Base characters are a subset of graphic characters (see D51 in Chapter 3 of the Unicode Standard). Control characters aren't graphic characters. Hence they are not base characters, either, and could never be used in variation sequences, anyway. --Ken
Re: C1 Control Pictures Proposal
On 13 August 2011 18:48, Sean Leonard lists+unic...@seantek.com wrote: The Unicode code points U+ through U+00FF share the equivalent values from the ASCII Standard, ISO 646, ISO 6429, and ISO 8859-1. In many contexts, it is desirable to display all of these code points/characters uniquely and unambiguously. C0 Control Pictures are currently encoded in the Unicode Standard at U+2400; that block currently covers the undisplayable code points at U+-U+0020 (plus a few extra alternatives/additions). However, the undisplayable characters in U+0080-U+00FF are left out. There are several business cases in which C1 Control Pictures are useful: 1. Terminal emulators need them for debugging. 2. Data analyzers need them so they can have a unique character that when the graphics subsystem/text renderers render each character, is intended for display rather than for control effects. 3. Engineers can distinguish when communicating between the data without side-effects (i.e., control characters as pictures), and the data that invokes side-effects (i.e., control characters used as control characters). 4. There are use cases for historic or scholarly purposes, to encode and discuss these characters in text, as distinct from invoking their side-effects (and displaying nothing). 5. To display all values in U+ - U+00FF as distinct _characters_, rather than in hexadecimal representation (which makes deciphering the meaning of the codes for graphic characters in the ASCII (G0) ISO 8859-1 (G1) range very difficult), in the same width and font as the rest of the graphic characters. 6. In support of 1-5, font designers can design fonts that support C1 Control Pictures and that map glyphs to Unicode code points uniformly and interchangeably (two key architectural goals of the Unicode Standard). Without C1 Control Pictures, it is infeasible to provide graphical representations of the C1 Control Characters. This is an asymmetry compared to the C0 Control Pictures block in Unicode, and thus should be remedied. It would probably be useful to read the WG2 Principles and Procedures document http://std.dkuug.dk/JTC1/SC2/WG2/docs/n3902.pdf particularly Annex H Criteria for encoding symbols, which states that: The fact that a symbol merely seems to be useful or potentially useful is precisely not a reason to code it. Demonstrated usage, or demonstrated demand, on the other hand, does constitute a good reason to encode the symbol. (H10 on p.37) Unless you can show evidence that C1 control pictures are currently in use and that there is a clear demand from the user community to represent them in plain text it is unlikely that your proposal will get very far. Andrew