On 7/15/2011 9:03 AM, Doug Ewell wrote:
Andrew West<andrewcwest at gmail dot com>  replied to Michael Everson:

I think that having encoded symbols for control characters (which we
already have for some of them) is no bad thing, and the argument
about "too many characters" is not compelling, as there are only some
dozens of these characters encoded, not thousands and thousands or
anything.
I oppose encoding graphic clones of non-graphic characters on
principle, not because of how many there are.
I agree with Michael about a lot of things, and this isn't going to be
one of them.  The main arguments I am seeing in favor of encoding are:

1. Graphic symbols for control characters are needed so writers can
write about the control characters themselves using plain text.

When users outside the character encoding community start reporting such a need in great numbers, it would indicate that there might (might!) be a real requirement. The character coding community has had decades to figure out ways to manage without this - and the current occasion (review of Apple's symbol fonts) is not a suitable context to suddenly drag in something that could have been addressed anytime for the last 20 years, if it had been really urgent.

I don't think there's any end to where this can go.  As Martin said,
eventually you'd need a meta-meta-character to talk about the
meta-character, and then it's not just a size problem, but an
infinite-looping problem.

What real users need is to show "hidden" characters. That need can be served with different mechanisms. There seems to not be a consensus though, on what the preferred approach should be and implementations disagree. That kind of issue needs to be addressed differently, involving the cooperation of major implementers.


2. The precedent was established by the U+2400 block.

I thought those were compatibility characters, in the original sense:
encoded because they were part of some pre-existing standard.  That's
not necessarily a precedent in itself to encode more characters that are
similar in nature.

Doug is entirely correct. These are a precedent only if an extended set of other such symbols was found in use in some de-facto character set. In that special case, an argument for compatibility with *that* character set could be made. And for that to be successful, it would have to be shown that the character set is widely used and compatibility to it is of critical importance.

In addition, I claim, experience has shown that the the control code image characters are not widely used. That means, any hope that the early encoders (and these go back to 1.0) may have had that those symbols are useful characters in their own right, simply have not been borne out.


3. There aren't that many of them.

We regularly dismiss arguments of the form "But there's lots of room for
these in Unicode" when someone proposes to encode something that
shouldn't be there.  I don't see this as any different.

Correct.

The only time this argument is useful is in deciding between encoding the same character directly or as character sequence. Using character sequences solely because of encoding space reasons, as opposed to the reason that the elements are characters in their own right, has become irrelevant due to the introduction of 16 more planes.

The same is true for excessive unification of certain symbols or punctuation characters: saving code space is not a valid argument here - so any decision needs to be based on other facts.

Michael is responsible for adding many thousands of characters to
Unicode, so it's awkward for me to be debating character-encoding
principles with him, but there we are.




Well, in this business, no-one's infallible.

A./

Reply via email to