Re: Superscript and Subscript Characters in General Use / Re: French Superscript Abbreviations Fit Plain Text Requirements

Marcel Schneider Mon, 23 Jan 2017 01:45:07 -0800

Gladly this thread comes now to a far better and very useful result.
A set of Unicode super- and subscripts are proven to be already promoted by 
Microsoft 
in a fully validated way. From this we can expand to promote the use of a set 
of 
Latin superscript letters. Connectedly, Microsoftʼs position of unsupporting 
the 
OpenType rendering properties of U+2044 FRACTION SLASH (at least in a Latin 
script 
context in Edge) turns out to be a fairly user-frienly, practice-oriented 
option. 
That helps, too, to get around of holding peopleʼs feet to the fire about 
U+2044.

On Wed, 28 Dec 2016 13:47:00 -0800, Asmus Freytag wrote:
[…]
> 
> Mathematical notation is a good example of such a mixed case: while 
> ordinary variables can be expressed in plain text with the help of 
> mathematical alphabets, the proper display of formulas requires markup. 
> Even Murray Sargent's plain text math is markup, albeit a very clever one 
> that re-uses conventions used for the inline presentation of mathematical 
> expression. (Where that is insufficient, it introduces additional 
> conventions, clearly extraneous to the content, and hence markup).
> 
http://www.unicode.org/mail-arch/unicode-ml/y2016-m12/0119.html

Murray Sargentʼs Nearly Plain-Text Encoding of Mathematics (UnicodeMath) is in 
my 
opinion a key gateway to the understanding of Unicode, and thus becomes a key 
point 
in my communication about Unicode-supporting keyboard layouts. See version 3.1:
http://www.unicode.org/notes/tn28/UTN28-PlainTextMath-v3.1.pdf

Thanks to Asmus Freytag for drawing our attention to it!

What makes this notation so important to this threadʼs issue, is in that it 
uses 
Unicode superscripts and subscripts as a valid and parseable alternative to the 
[La]TeX-style notation that uses markup ('^' and '_'), “since Unicode has a 
full set 
of decimal subscripts and superscripts. As a practical matter, numeric 
subscripts 
are typically entered using an underscore and the number followed by a space or 
an operator” (p. 7).

These Unicode superscript and subscript characters are parseable and are 
converted 
to formatted digits at build. Hence they are unambiguous, not random characters 
as 
sometimes alleged. They “should be rendered the same way that scripts of the 
corresponding script nesting level would be rendered.” (p. 18)

Although fractions are ordinarily written with ASCII digits and slash, U+2044 
can 
be used to get skewed fractions (p. 5) built up in Microsoft Word (where 
fractions 
can also be formatted using the math features). Combining both schemes, the 
user 
may feel free to write fractions using super/sub scripts around U+2044, as 
suggested 
in the already cited wiki proposing to add a huge autocorrect list for quick 
input:
https://answers.microsoft.com/en-us/msoffice/wiki/msoffice_word-mso_other/styled-fractions-in-windows/4a07d5fa-2484-4e39-b1f3-70bb3eb0c332

This is practice-oriented and user-friendly because relying only on the 
OpenType font 
feature specified for U+2044 would dramatically restrict the number of usable 
fonts, 
that in Latin script is traditionally several thousands, as opposed to complex 
scripts 
for which HarfBuzz is primarily intended, where the number of available 
typefaces is 
much smaller, so that full conversion to OpenType is feasible. So I think that 
the 
correct rendering of U+2044 in HarfBuzz targets mainly these complex scripts. 
In 
other scripts like Latin, the feature would then be a nice fall-off, that 
potentially 
raises user expectations about professional (typographical) ligature rendering.

At the other end, for drafts and even “for simple documentation purposes”, 
“plain-text linearly formatted mathematical expressions can be used ‘as is’” 
(p.29). 
That can be extended to vulgar fractions in current text, and abbreviations.

This helps to understand that any font with inconsistent glyphs for Unicode 
subscript 
and superscript digits is not Unicode conformant. 
The same applies to superscript i and n (as mentioned in:
http://www.unicode.org/mail-arch/unicode-ml/y2017-m01/0093.html
). These inconsistent fonts don’t conform to the Unicode Standard specifying 
that 
there is no functional difference between those characters that have the word 
SUPERSCRIPT in their name, and those that donʼt:

TUS 9.0, §7.8, p. 327:
| The superscript forms of the i and n letters can be found in the
| Superscripts and Subscripts block (U+2070..U+209F). The fact that the latter 
| two letters contain the word “superscript” in their names instead of 
“modifier 
| letter” is an historical artifact of original sources for the characters, and 
| is not intended to convey a functional distinction in the use of these 
| characters in the Unicode Standard.
http://www.unicode.org/versions/Unicode9.0.0/ch07.pdf#G24762

Moreover, the Code Charts contain comment lines to these two characters, 
connecting 
them to the set of Unicode superscript Latin letters named “MODIFIER LETTER”:

2071 SUPERSCRIPT LATIN SMALL LETTER I
* functions as a modifier letter
# <super> 0069
[…]
207F SUPERSCRIPT LATIN SMALL LETTER N
* functions as a modifier letter
# <super> 006E

Accordingly, the user can count on a whole small alphabet — except q, that has 
been 
rejected arguing invented imaginary allegations on behalf of the UTC — 
displaying in 
a consistent way in all complete, conformant fonts, with a running-text like 
layout 
so far as the fonts have proportional advance width. To run a test, see example 
in:
http://www.unicode.org/mail-arch/unicode-ml/y2017-m01/0093.html (again).

Trying to conclude so far (please feel free to correct), I now believe and will 
spread the word that following Microsoft — a user-friendly corporation eager to 
help 
everybody make the most of Unicode — the users of any word processor and text 
editor 
are welcome to use the Unicode repertoire as they need and like, while on the 
other 
hand, the recommendations in TUS may be considered a mere official discourse 
for 
encoding process management purposes, but with little through no real impact on 
actual practice. Hence, National Bodies and user communities as well as 
developers 
may issue usage recommendations of their own, to meet user expectations and 
propose 
working methods additionally—or alternatively—to those provided by the Standard.

Regards,
Marcel

Re: Superscript and Subscript Characters in General Use / Re: French Superscript Abbreviations Fit Plain Text Requirements

Reply via email to