Re: Too narrowly defined: DIVISION SIGN & COLON

Asmus Freytag Tue, 10 Jul 2012 13:33:56 -0700

On 7/9/2012 11:04 PM, Jukka K. Korpela wrote:

2012-07-10 5:32, Asmus Freytag wrote:
There are many characters that are used in professional mathematical
typesetting (division slash being one of them) that need to be narrowly
distinguished from other, roughly similar characters.
Typographic differences can be made at glyph selection level, too, oreven in font design and choice of font. Typesetting systems like TeXand derivatives have been very successful along such lines.

TeX and similar systems can get the correct appearance, but they do nothave the same benefit of a universal encoding of the semanticdistinction that underlies these variations in appearance.

Such narrowly defined characters are not aimed at the general user, and
it's totally irrelevant whether or not such a character ever becomes
"popular".
Popularity is relative to a population. When I wrote that “narrowsemantics does not make characters popular”, relating to the case ofDIVISION SLASH, I referred to popularity among people who couldconceivably have use for the characters. I don’t think there’s muchactual use of DIVISION SLASH in the wild. And this was about a casewhere the distinction is not only semantic (actually the Unicodestandard does not describe the semantic side of the matter exceptimplicitly via things like Unicode name and General Category of thecharacter) but also has, or may have, direct impact on rendering.

I don't know, I would ask mathematical publishers whether they useordinary or division slash.

Very early in the design cycle for Unicode there
was a request for encoding of a decimal period, in distinction to a full
stop. The problem here is that there is no visual distinction
This is more or less a vicious circle, and the starting point isn’teven true. In British usage, the decimal point is often a somewhatraised dot, above the baseline. But even if we assume that nodistinction *had been made* before the decision, the decision itselfimplied that no distinction *can be made* by choice of character.

Encoding the same appearance (shape) as two separate characters issomething that the Unicode standard reserves to well-motivatedexceptions, such as the multiple encoding of the shape "E" for theLatin, Greek and Cyrillic scripts. You don't need to look further thatthe issues raised with spoofing of internet identifiers to see thatthere are strong downsides to duplicate encoding. This is particularlytrue, when the distinctions in usage are mere notational conventions andnot as fundamental as script membership.

If a different decision had been made, people could choose to use adecimal point character, or they could keep using just the ambiguousFULL STOP character. Font designers could make them identical, or theycould make them different. But most probably, most people would noteven be aware of the matter: they would keep pressing the keyboard keylabeled with “.” – that is, the decimal point character would not havemuch popularity. In British typesetting, people would probably stilluse whatever methods they now use to produce raised dots.

A nice argument can be made for encoding a *raised* decimal dot (if it'snot representable by any number of other raised dots already encoded).Clearly, in the days of lead typography, a British style decimal dotwould have been something that was a distinct piece of lead from aperiod. In the end, no such request was made.

Unicode has relatively consistently refused to duplicate encodings in
such circumstances, because the point about Unicode is not that one
should be able to encode information about the intent that goes beyond
what can be made visible by rendering the text. Instead, the point about
Unicode is to provide a way to unambiguously define enough of the text
so that it becomes "legible". How legible text is then "understood" is
another issue.
That’s a nice compact description of the principle, but perhaps thereal reasons also include the desire to avoid endless debates over“semantics”. Some semantic differences, like the use of a character asa punctuation symbol vs. as a mathematical symbol, are relativelyclear. Most semantics differences that can be made are not that clearat all.

Being able to encode an intent that is not directly visible to a readerof a rendered text has issues that go beyond the niceties of debatingsemantics. There are some cases where the downsides of that are (nearly)unavoidable, and duplicate encoding is - in the end - the better answer.But notational conventions usually don't qualify, because it's thesharing of that convention between reader and writer that makes thenotation what it is.

Because of that, there was never any discussion whether the ! would have
to be re-encoded as "factorial". It was not.
This implies that if anyone thinks that the factorial symbol shouldlook different from a normal exclamation mark, to avoid ambiguity (asin the sentence “The result is n!”), he cannot do that at thecharacter level.

He can do so on a stylistic level, or a notational level (using adifferent convention, perhaps adopting the convention of ending thatstatement with a sentence ending period, as in "The result is n!.").

A large number of mathematical and other symbols have originated asother characters used for special purposes, then styled to havedistinctive shapes, later identified as separate symbols. For example,N-ARY SUMMATION ∑ is now mostly visually different from GREEK CAPITALLETTER SIGMA Σ, though it was originally just the Greek letter used ina specific meaning and context.

Correct, and at some point, such notational advances lead to new symbolsand new characters. There is a (very short) pipeline of mathematicalsymbols that have been recently introduced and might get encoded whenthey gain critical acceptance.

A principle that refuses to “re-encode” characters for semanticdistinctions seems to put a stop on such development. But of coursenew characters are still being developed from old characters forvarious purposes and can be encoded. They just need to have somevisual identity different from the old characters from the very start,to have a chance of getting encoded.

Correct, the point of differentiation requires not only a differentinterpretation, but a distinct appearance as well. Not to recognize thatis an often practiced fallacy based on taking too literally the mantrathat "Unicode encodes the semantics".

The proper thing to do would be to add these usages to the list of
examples of known contextually defined usages of punctuation characters,
they are common enough that it's worth pointing them out in order to
overcome a bit of the inherent bias from Anglo-Saxon usage.


So what would be needed for this? I previously suggested annotations like

: also used to denote division

and

÷ also used to denote subtraction

But perhaps the former should be a little longer:

:  also used to denote division and ratio

(especially since the use for ratio is more official and probably morecommon).


Essentially,

A./

Re: Too narrowly defined: DIVISION SIGN & COLON

Reply via email to