Re: Request for Inclusion of Subscript for the English Letter “y”

Asmus Freytag via Unicode Tue, 05 Nov 2024 22:49:28 -0800

On 11/5/2024 12:31 PM, Phil Smith III via Unicode wrote:

I assume you’ve seenhttps://en.wikipedia.org/wiki/Unicode_subscripts_and_superscripts,which discusses what is and isn’t available as super/subscripts(henceforth “ss”) in Unicode. That surprised me—I would have thoughtthat ss were markup, not characters, so there’s more of it implementedalready than I’d expected.

The consensus that emerged over the first several decades of encodingUnicode treats these forms somewhat ambiguously.

In mathematical notation, any character can be a super or subscript, andso you find multiple scripts and symbols, but with not limit, inprinciple as to what additional characters some specialty may adopt andsuper/subscript for some purpose. And you have things like subscripts onsubscripts and similarly complex layouts. In that context it isdefinitely appropriate to treat subscripting as a generic operation andto not try to encode some subset of possible results of that operation.You could never encode all forms that are ever used (or available foruse) in mathematical notation, so for that purpose, encoding any furtherexplicit subscript forms doesn't help.

There is generic use of (mostly) superscript numbers in text, for thingslike footnotes. These are also best done as generic operations (viastyles), particularly as they relate to document structure that alreadysuggests the use of plain text.

There are other notations, mainly phonetic, that have super/subscriptforms but do not//need recursive subscripting or all the otherinteresting features of mathematical layout and formatting. In many ofthem, the super or subscript form often acts pretty much like any otherletter in the notation, except for its shape. Common to these notationsis that there's a fixed set of such shapes; they don't even cover a fullbasic alphabet; (that Unicode is getting close to having a full alphabetis from overlapping use).

For these cases there's a benefit in being able to have a robust plaintext representation, so that "words" aren't required to use styling tobe understood. That's the driving case behind encoding these forms.Ultimately the realization was that a universal character encoding couldnot be "one-size-fits-all" when it comes to serve wildly divergingstyles of usage.

Another example of this dichotomy again involves the distinction betweenmathematics and text. In text, the plain text does not carry fontinformation and it is fully acceptable to render the result in any fontthat supports the letters in question. That even goes for styles thataren't fully readable to everyday users. For example, text in the Latinscript can be rendered using a Fraktur font that many people may havedifficulties deciphering or reading fluently. No matter, you haven'tchanged the meaning of the text by doing that. And the selection ofpossible fonts is near infinite. Some font variations are generic enoughthat they can be applied to many scripts, others may be limited inpractice to some specific alphabet.

In math notation, you have the situation that mathematicians have usedthe contrast between different font shapes to carry meaning. In someconventions, Fraktur shapes are used to indicate that a variable is avector and not a scalar, for example. There are a handful of font stylesthat are used in this way, a fairly fixed set, and usually covering alimited set of characters as well. Because the operation is not fullygeneric, it is possible to cover it with explicitly encoded characters.At that point, there's the benefit of preserving that distinction inplain text.

In fact, it's possible this way, to render a very large subset ofmathematical notation in an (almost) plain text form. Incidentallysomething not that dissimilar from the concept of markdown, a plain textstream with a few chosen conventions, in the math case, about the use ofparens, plus dedicating some character to function as subscript andsuperscript "operator". (All the other math operators, such as integralsor radical signs, trigger their own formatting, thus obviating the needfor encoding that explicitly).

Having the character for all shape variants used for variables encodeddirectly makes this near plaintext form very powerful. Again, what is auseful generic situation for ordinary text isn't as workable for anotational system and vice versa. They emerging insight was that Unicodeshould strive to make reasonable accommodations, but in a way thatfocused on the central needs for and features of each of them.

If you look just at the encoding though, you come away with a sense ofapparent duplication and also seeming incompleteness: the additions forphonetic notations will never cover the generic use of math, while thefew styled alphabets for math do nothing for general text use. The keyis to recognize which notation or use case is supported by what, andthen things make a whole lot more sense.

A./

Re: Request for Inclusion of Subscript for the English Letter “y”

Reply via email to